I just restarted an OSD node and none of the admin sockets showed up on reboot (though it joined the cluster fine and all OSDs are happy. The node is a Ubuntu 12.04.3 system originally deployed via ceph-deploy on dumpling.
The only thing that stands out to me is the failure on lock_fsid and the error converting store message. Here are the snip from OSD 19 of a full reboot starting with the shutdown complete entry, and going until all the reconnect messages. 2013-11-12 09:44:00.757576 7fb8a8e24780 1 -- 192.168.200.54:6819/23261shutdown complete. 2013-11-12 09:47:05.843425 7f7918e9d780 0 ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 1734 2013-11-12 09:47:05.892704 7f7918e9d780 1 filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs 2013-11-12 09:47:05.892718 7f7918e9d780 1 filestore(/var/lib/ceph/osd/ceph-19) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2013-11-12 09:47:05.944312 7f7918e9d780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is supported and appears to work 2013-11-12 09:47:05.944327 7f7918e9d780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-11-12 09:47:05.944743 7f7918e9d780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2013-11-12 09:47:06.258005 7f7918e9d780 0 filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2013-11-12 09:47:07.567405 7f7918e9d780 1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 19: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1 2013-11-12 09:47:07.570098 7f7918e9d780 1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 19: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1 2013-11-12 09:47:07.570352 7f7918e9d780 1 journal close /var/lib/ceph/osd/ceph-19/journal 2013-11-12 09:47:07.571215 7f7918e9d780 1 filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs 2013-11-12 09:47:07.572742 7f7918e9d780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is supported and appears to work 2013-11-12 09:47:07.572750 7f7918e9d780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-11-12 09:47:07.573234 7f7918e9d780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2013-11-12 09:47:07.574879 7f7918e9d780 0 filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2013-11-12 09:47:07.577043 7f7918e9d780 1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1 2013-11-12 09:47:07.578649 7f7918e9d780 1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1 2013-11-12 09:47:07.680531 7f7918e9d780 0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello 2013-11-12 09:47:09.670813 7f8151b5f780 0 ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 2769 2013-11-12 09:47:09.673789 7f8151b5f780 0 filestore(/var/lib/ceph/osd/ceph-19) lock_fsid failed to lock /var/lib/ceph/osd/ceph-19/fsid, is another ceph-osd still running? (11) Resource temporarily unavailable 2013-11-12 09:47:09.673804 7f8151b5f780 -1 filestore(/var/lib/ceph/osd/ceph-19) FileStore::mount: lock_fsid failed 2013-11-12 09:47:09.673919 7f8151b5f780 -1 ** ERROR: error converting store /var/lib/ceph/osd/ceph-19: (16) Device or resource busy 2013-11-12 09:47:14.169305 7f78fd548700 0 -- 10.200.1.54:6802/1734 >> 10.200.1.51:6800/13263 pipe(0x1e48c80 sd=42 :55275 s=2 pgs=5530 cs=1 l=0 c=0x1eae2c0).fault, initiating reconnect 2013-11-12 09:47:14.169444 7f78fd346700 0 -- 10.200.1.54:6802/1734 >> 10.200.1.57:6804/8226 pipe(0xc1ed500 sd=43 :47978 s=2 pgs=16845 cs=1 l=0 c=0x1eae840).fault, initiating reconnect 2013-11-12 09:47:14.169988 7f78fd144700 0 -- 10.200.1.54:6802/1734 >> 10.200.1.59:6810/4862 pipe(0xc1ed280 sd=46 :37094 s=2 pgs=42297 cs=1 l=0 c=0x1eae6e0).fault, initiating reconnect And here is roughly the same snip from just doing a 'sudo restart ceph-osd-all': 2013-11-12 09:56:36.658014 7f7918e9d780 1 -- 192.168.200.54:6811/1734shutdown complete. 2013-11-12 09:56:37.556988 7f3793c21780 0 ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 13723 2013-11-12 09:56:37.559314 7f3793c21780 1 filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs 2013-11-12 09:56:37.559319 7f3793c21780 1 filestore(/var/lib/ceph/osd/ceph-19) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2013-11-12 09:56:37.561350 7f3793c21780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is supported and appears to work 2013-11-12 09:56:37.561360 7f3793c21780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-11-12 09:56:37.562357 7f3793c21780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2013-11-12 09:56:37.571030 7f3793c21780 0 filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2013-11-12 09:56:37.574273 7f3793c21780 1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1 2013-11-12 09:56:37.578189 7f3793c21780 1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1 2013-11-12 09:56:37.578854 7f3793c21780 1 journal close /var/lib/ceph/osd/ceph-19/journal 2013-11-12 09:56:37.579638 7f3793c21780 1 filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs 2013-11-12 09:56:37.581110 7f3793c21780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is supported and appears to work 2013-11-12 09:56:37.581118 7f3793c21780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-11-12 09:56:37.582014 7f3793c21780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2013-11-12 09:56:37.583365 7f3793c21780 0 filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2013-11-12 09:56:37.585765 7f3793c21780 1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 24: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1 2013-11-12 09:56:37.588281 7f3793c21780 1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 24: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1 2013-11-12 09:56:37.589782 7f3793c21780 0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello 2013-11-12 09:56:39.723134 7f377488b700 0 -- 10.200.1.54:6807/13723 >> 10.200.1.56:6806/563 pipe(0xc87ca00 sd=155 :38290 s=1 pgs=17864 cs=2 l=0 c=0xc893160).fault 2013-11-12 09:56:39.728798 7f3775194700 0 -- 10.200.1.54:6807/13723 >> 10.200.1.52:6808/14464 pipe(0xc811000 sd=52 :51030 s=1 pgs=7473 cs=6 l=0 c=0xc7fbb00).fault 2013-11-12 09:56:39.807114 7f37787ca700 0 -- 10.200.1.54:6807/13723 >> 10.200.1.52:6805/14449 pipe(0xc756280 sd=72 :46552 s=1 pgs=10912 cs=96 l=0 c=0xc740420).fault 2013-11-12 09:56:39.852465 7f3778ccf700 0 -- 10.200.1.54:6807/13723 >> 10.200.1.57:6804/8226 pipe(0x2427780 sd=83 :48234 s=1 pgs=17251 cs=128 l=0 c=0x2406dc0).fault 2013-11-12 09:56:39.898327 7f377488b700 0 -- 10.200.1.54:6807/13723 >> 10.200.1.56:6806/563 pipe(0xc87ca00 sd=42 :40942 s=1 pgs=17945 cs=164 l=0 c=0xc893160).fault 2013-11-12 09:56:40.738437 7f3775ea1700 0 -- 10.200.1.54:6807/13723 >> 10.200.1.60:6810/32089 pipe(0xc7c2500 sd=72 :40289 s=2 pgs=33225 cs=109 l=0 c=0xc7fb840).fault with nothing to send, going to standby 2013-11-12 09:56:40.740185 7f376b2fd700 0 -- 10.200.1.54:6807/13723 >> 10.200.1.60:6810/32089 pipe(0xcd66a00 sd=279 :6807 s=0 pgs=0 cs=0 l=0 c=0xc79d000).accept connect_seq 0 vs existing 109 state standby 2013-11-12 09:56:40.740201 7f376b2fd700 0 -- 10.200.1.54:6807/13723 >> 10.200.1.60:6810/32089 pipe(0xcd66a00 sd=279 :6807 s=0 pgs=0 cs=0 l=0 c=0xc79d000).accept peer reset, then tried to connect to us, replacing 2013-11-12 09:56:41.639911 7f376fd47700 0 -- 192.168.200.54:6806/13723 >> 192.168.48.127:0/234188561 pipe(0xcf87a00 sd=127 :6806 s=0 pgs=0 cs=0 l=0 c=0xcb80580).accept peer addr is really 192.168.48.127:0/234188561 (socket is 192.168.48.127:60893/0) 2013-11-12 09:56:44.394952 7f37657a3700 0 -- 10.200.1.54:6807/13723 >> 10.200.1.54:6810/13792 pipe(0xcee7c80 sd=160 :6807 s=0 pgs=0 cs=0 l=0 c=0xd0d7160).accept connect_seq 0 vs existing 0 state connecting 2013-11-12 09:56:59.334100 7f3764396700 0 -- 192.168.200.54:6806/13723 >> 192.168.48.102:0/663636012 pipe(0xdbb9280 sd=197 :6806 s=0 pgs=0 cs=0 l=0 c=0xdbbc000).accept peer addr is really 192.168.48.102:0/663636012 (socket is 192.168.48.102:35496/0) 2013-11-12 09:57:45.805456 7f3764194700 0 -- 192.168.200.54:6806/13723 >> 192.168.48.103:0/1090276439 pipe(0xdbb9000 sd=180 :6806 s=0 pgs=0 cs=0 l=0 c=0xce83dc0).accept peer addr is really 192.168.48.103:0/1090276439 (socket is 192.168.48.103:41220/0) After the 'restart ceph-osd-all' the admin sockets for all 4 OSDs on this host are present. Let me know if there is additional logging or assistance I can provide to narrow it down. Thanks, Berant On Tue, Nov 12, 2013 at 4:03 AM, Joao Luis <[email protected]> wrote: > > On Nov 12, 2013 2:38 AM, "Berant Lemmenes" <[email protected]> wrote: > > > > I noticed the same behavior on my dumpling cluster. They wouldn't show > up after boot, but after a service restart they were there. > > > > I haven't tested a node reboot since I upgraded to emperor today. I'll > give it a shot tomorrow. > > > > Thanks, > > Berant > > > > On Nov 11, 2013 9:29 PM, "Peter Matulis" <[email protected]> > wrote: > >> > >> After upgrading from Dumpling to Emperor on Ubuntu 12.04 I noticed the > >> admin sockets for each of my monitors were missing although the cluster > >> seemed to continue running fine. There wasn't anything under > >> /var/run/ceph. After restarting the service on each monitor node they > >> reappeared. Anyone? > >> > >> ~pmatulis > >> > > Odd behavior. The monitors do remove the admin socket on shutdown and > proceed to create it when they start, but as long as they are running it > should exist. Have you checked the logs for some error message that could > provide more insight on the cause? > > -Joao > > _______________________________________________ > >> ceph-users mailing list > >> [email protected] > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
