On Wed, Jan 4, 2017 at 6:05 PM, 许雪寒 <xuxue...@360.cn> wrote: > We've already restarted the OSD successfully. > Now, we are trying to figure out why the OSD suicide itself
Network issue which causes pretty unstable communication with other OSDs in same acting set causes suicide usually. > > Re: [ceph-users] Is this a deadlock? > > Hi, thanks for the quick reply. > > We manually deployed this OSD, and it has been running for more than half a > year. The output last night should be the latter one that you metioned Last > night, one of our switch got some problem and made the OSD unconnected to > other peer, which in turn made the monitor to wrongly mark the OSD down. > > Thank you:-) > > > > On Wed, 4 Jan 2017 07:49:03 +0000 许雪寒 wrote: > >> Hi, everyone. >> >> Recently in one of our online ceph cluster, one OSD suicided itself after >> experiencing some network connectivity problem, and the OSD log is as >> follows: >> > > Version of Ceph and all relevant things would help. > Also "some network connectivity problem" is vague, if it were something like > a bad port or overloaded switch you'd think that more than one OSD would be > affected. > > [snip, I have nothing to comment on that part] >> >> > >> And by the way, when we first tried to restart OSD who committed suicide >> through “/etc/init.d/ceph start osd.619”, an error was reported, and it said >> something like “OSD.619 is not found”, which seemed that OSD.619 was never >> created in this cluster. We are really confused, please help us. >> > How did you create that OSD? > Manually or with ceph-deploy? > The fact that you're trying to use a SYS-V initscript suggests both and older > Ceph version and OS and thus more likely a manual install. > > In which case that OSD needs to be defined in ceph.conf on that node. > Full output of that error message would have told us these things, like: > --- > root@ceph-04:~# /etc/init.d/ceph start osd.444 > /etc/init.d/ceph: osd.444 not found (/etc/ceph/ceph.conf defines mon.ceph-04 > osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24 , /var/lib/ceph > defines mon.ceph-04 osd.25 osd.31 osd.30 osd.26 osd.29 osd.27 osd.28 osd.24) > --- > The above is the output from a Hammer cluster with OSDs deployed with > ceph-deploy. > And incidentally the "ceph.conf" part of the output is a blatant lie and just > a repetition of what it gathered from /var/lib/ceph. > > This is a Hammer cluster with manually deployed OSDs: > --- > engtest03:~# /etc/init.d/ceph start osd.33 > /etc/init.d/ceph: osd.33 not found (/etc/ceph/ceph.conf defines mon.engtest03 > mon.engtest04 mon.engtest05 mon.irt03 mon.irt04 mds.engtest03 osd.20 osd.21 > osd.22 osd.23, /var/lib/ceph defines ) > --- > > Christian > -- > Christian Balzer Network/Systems Engineer > ch...@gol.com Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com