Re: [ceph-users] No monitor sockets after upgrading to Emperor

Berant Lemmenes Tue, 12 Nov 2013 07:07:50 -0800

I just restarted an OSD node and none of the admin sockets showed up on
reboot (though it joined the cluster fine and all OSDs are happy. The node
is a Ubuntu 12.04.3 system originally deployed via ceph-deploy on dumpling.


The only thing that stands out to me is the failure on lock_fsid and the
error converting store message.

Here are the snip from OSD 19 of a full reboot starting with the shutdown
complete entry, and going until all the reconnect messages.

2013-11-12 09:44:00.757576 7fb8a8e24780  1 --
192.168.200.54:6819/23261shutdown complete.
2013-11-12 09:47:05.843425 7f7918e9d780  0 ceph version 0.72
(5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 1734
2013-11-12 09:47:05.892704 7f7918e9d780  1
filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:47:05.892718 7f7918e9d780  1
filestore(/var/lib/ceph/osd/ceph-19)  disabling 'filestore replica fadvise'
due to known issues with fadvise(DONTNEED) on xfs
2013-11-12 09:47:05.944312 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is supported and appears to work
2013-11-12 09:47:05.944327 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:47:05.944743 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:47:06.258005 7f7918e9d780  0
filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2013-11-12 09:47:07.567405 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 19: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:47:07.570098 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 19: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:47:07.570352 7f7918e9d780  1 journal close
/var/lib/ceph/osd/ceph-19/journal
2013-11-12 09:47:07.571215 7f7918e9d780  1
filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:47:07.572742 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is supported and appears to work
2013-11-12 09:47:07.572750 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:47:07.573234 7f7918e9d780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:47:07.574879 7f7918e9d780  0
filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2013-11-12 09:47:07.577043 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:47:07.578649 7f7918e9d780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:47:07.680531 7f7918e9d780  0 <cls>
cls/hello/cls_hello.cc:271: loading cls_hello
2013-11-12 09:47:09.670813 7f8151b5f780  0 ceph version 0.72
(5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 2769
2013-11-12 09:47:09.673789 7f8151b5f780  0
filestore(/var/lib/ceph/osd/ceph-19) lock_fsid failed to lock
/var/lib/ceph/osd/ceph-19/fsid, is another ceph-osd still running? (11)
Resource temporarily unavailable
2013-11-12 09:47:09.673804 7f8151b5f780 -1
filestore(/var/lib/ceph/osd/ceph-19) FileStore::mount: lock_fsid failed
2013-11-12 09:47:09.673919 7f8151b5f780 -1  ** ERROR: error converting
store /var/lib/ceph/osd/ceph-19: (16) Device or resource busy
2013-11-12 09:47:14.169305 7f78fd548700  0 -- 10.200.1.54:6802/1734 >>
10.200.1.51:6800/13263 pipe(0x1e48c80 sd=42 :55275 s=2 pgs=5530 cs=1 l=0
c=0x1eae2c0).fault, initiating reconnect
2013-11-12 09:47:14.169444 7f78fd346700  0 -- 10.200.1.54:6802/1734 >>
10.200.1.57:6804/8226 pipe(0xc1ed500 sd=43 :47978 s=2 pgs=16845 cs=1 l=0
c=0x1eae840).fault, initiating reconnect
2013-11-12 09:47:14.169988 7f78fd144700  0 -- 10.200.1.54:6802/1734 >>
10.200.1.59:6810/4862 pipe(0xc1ed280 sd=46 :37094 s=2 pgs=42297 cs=1 l=0
c=0x1eae6e0).fault, initiating reconnect


And here is roughly the same snip from just doing a 'sudo restart
ceph-osd-all':

2013-11-12 09:56:36.658014 7f7918e9d780  1 --
192.168.200.54:6811/1734shutdown complete.
2013-11-12 09:56:37.556988 7f3793c21780  0 ceph version 0.72
(5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 13723
2013-11-12 09:56:37.559314 7f3793c21780  1
filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:56:37.559319 7f3793c21780  1
filestore(/var/lib/ceph/osd/ceph-19)  disabling 'filestore replica fadvise'
due to known issues with fadvise(DONTNEED) on xfs
2013-11-12 09:56:37.561350 7f3793c21780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is supported and appears to work
2013-11-12 09:56:37.561360 7f3793c21780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:56:37.562357 7f3793c21780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:56:37.571030 7f3793c21780  0
filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2013-11-12 09:56:37.574273 7f3793c21780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:56:37.578189 7f3793c21780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:56:37.578854 7f3793c21780  1 journal close
/var/lib/ceph/osd/ceph-19/journal
2013-11-12 09:56:37.579638 7f3793c21780  1
filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:56:37.581110 7f3793c21780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is supported and appears to work
2013-11-12 09:56:37.581118 7f3793c21780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:56:37.582014 7f3793c21780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:56:37.583365 7f3793c21780  0
filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2013-11-12 09:56:37.585765 7f3793c21780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 24: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:56:37.588281 7f3793c21780  1 journal _open
/var/lib/ceph/osd/ceph-19/journal fd 24: 10239344640 bytes, block size 4096
bytes, directio = 1, aio = 1
2013-11-12 09:56:37.589782 7f3793c21780  0 <cls>
cls/hello/cls_hello.cc:271: loading cls_hello
2013-11-12 09:56:39.723134 7f377488b700  0 -- 10.200.1.54:6807/13723 >>
10.200.1.56:6806/563 pipe(0xc87ca00 sd=155 :38290 s=1 pgs=17864 cs=2 l=0
c=0xc893160).fault
2013-11-12 09:56:39.728798 7f3775194700  0 -- 10.200.1.54:6807/13723 >>
10.200.1.52:6808/14464 pipe(0xc811000 sd=52 :51030 s=1 pgs=7473 cs=6 l=0
c=0xc7fbb00).fault
2013-11-12 09:56:39.807114 7f37787ca700  0 -- 10.200.1.54:6807/13723 >>
10.200.1.52:6805/14449 pipe(0xc756280 sd=72 :46552 s=1 pgs=10912 cs=96 l=0
c=0xc740420).fault
2013-11-12 09:56:39.852465 7f3778ccf700  0 -- 10.200.1.54:6807/13723 >>
10.200.1.57:6804/8226 pipe(0x2427780 sd=83 :48234 s=1 pgs=17251 cs=128 l=0
c=0x2406dc0).fault
2013-11-12 09:56:39.898327 7f377488b700  0 -- 10.200.1.54:6807/13723 >>
10.200.1.56:6806/563 pipe(0xc87ca00 sd=42 :40942 s=1 pgs=17945 cs=164 l=0
c=0xc893160).fault
2013-11-12 09:56:40.738437 7f3775ea1700  0 -- 10.200.1.54:6807/13723 >>
10.200.1.60:6810/32089 pipe(0xc7c2500 sd=72 :40289 s=2 pgs=33225 cs=109 l=0
c=0xc7fb840).fault with nothing to send, going to standby
2013-11-12 09:56:40.740185 7f376b2fd700  0 -- 10.200.1.54:6807/13723 >>
10.200.1.60:6810/32089 pipe(0xcd66a00 sd=279 :6807 s=0 pgs=0 cs=0 l=0
c=0xc79d000).accept connect_seq 0 vs existing 109 state standby
2013-11-12 09:56:40.740201 7f376b2fd700  0 -- 10.200.1.54:6807/13723 >>
10.200.1.60:6810/32089 pipe(0xcd66a00 sd=279 :6807 s=0 pgs=0 cs=0 l=0
c=0xc79d000).accept peer reset, then tried to connect to us, replacing
2013-11-12 09:56:41.639911 7f376fd47700  0 -- 192.168.200.54:6806/13723 >>
192.168.48.127:0/234188561 pipe(0xcf87a00 sd=127 :6806 s=0 pgs=0 cs=0 l=0
c=0xcb80580).accept peer addr is really 192.168.48.127:0/234188561 (socket
is 192.168.48.127:60893/0)
2013-11-12 09:56:44.394952 7f37657a3700  0 -- 10.200.1.54:6807/13723 >>
10.200.1.54:6810/13792 pipe(0xcee7c80 sd=160 :6807 s=0 pgs=0 cs=0 l=0
c=0xd0d7160).accept connect_seq 0 vs existing 0 state connecting
2013-11-12 09:56:59.334100 7f3764396700  0 -- 192.168.200.54:6806/13723 >>
192.168.48.102:0/663636012 pipe(0xdbb9280 sd=197 :6806 s=0 pgs=0 cs=0 l=0
c=0xdbbc000).accept peer addr is really 192.168.48.102:0/663636012 (socket
is 192.168.48.102:35496/0)
2013-11-12 09:57:45.805456 7f3764194700  0 -- 192.168.200.54:6806/13723 >>
192.168.48.103:0/1090276439 pipe(0xdbb9000 sd=180 :6806 s=0 pgs=0 cs=0 l=0
c=0xce83dc0).accept peer addr is really 192.168.48.103:0/1090276439 (socket
is 192.168.48.103:41220/0)

After the 'restart ceph-osd-all' the admin sockets for all 4 OSDs on this
host are present.

Let me know if there is additional logging or assistance I can provide to
narrow it down.

Thanks,
Berant



On Tue, Nov 12, 2013 at 4:03 AM, Joao Luis <[email protected]> wrote:

>
> On Nov 12, 2013 2:38 AM, "Berant Lemmenes" <[email protected]> wrote:
> >
> > I noticed the same behavior on my dumpling cluster. They wouldn't show
> up after boot, but after a service restart they were there.
> >
> > I haven't tested a node reboot since I upgraded to emperor today. I'll
> give it a shot tomorrow.
> >
> > Thanks,
> > Berant
> >
> > On Nov 11, 2013 9:29 PM, "Peter Matulis" <[email protected]>
> wrote:
> >>
> >> After upgrading from Dumpling to Emperor on Ubuntu 12.04 I noticed the
> >> admin sockets for each of my monitors were missing although the cluster
> >> seemed to continue running fine.  There wasn't anything under
> >> /var/run/ceph.  After restarting the service on each monitor node they
> >> reappeared.  Anyone?
> >>
> >> ~pmatulis
> >>
>
> Odd behavior. The monitors do remove the admin socket on shutdown and
> proceed to create it when they start, but as long as they are running it
> should exist. Have you checked the logs for some error message that could
> provide more insight on the cause?
>
>   -Joao
>
> _______________________________________________
> >> ceph-users mailing list
> >> [email protected]
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] No monitor sockets after upgrading to Emperor

Reply via email to