I have this issue with my NVMe OSDs, but not my HDD OSDs.  I have 15 HDD's
and 2 NVMe's in each host.  We put most of the journals on one of the
NVMe's and a few on the second, but added a small OSD partition to the
second NVMe for RGW metadata pools.

When restarting a server manually for testing, the NVMe OSD comes back up
normally.  We're tracking a problem with the OSD nodes freezing and having
to force reboot them.  After this, the NVMe OSD doesn't come back on its
own until I run `ceph-disk activate-all`.  This seems to track with your
theory that a non-clean FS is a part of the equation.

Is there any ideas as to how to resolve this yet?  So far being able to run
`ceph-disk activate-all` is good enough, but a bit of a nuisance.

On Fri, Sep 15, 2017 at 11:48 AM Matthew Vernon <[email protected]> wrote:

> Hi,
>
> On 14/09/17 16:26, Götz Reinicke wrote:
>
> > maybe someone has a hint: I do have a cephalopod cluster (6 nodes, 144
> > OSDs), Cents 7.3 ceph 10.2.7.
> >
> > I did a kernel update to the recent centos 7.3 one on a node and did a
> > reboot.
> >
> > After that, 10 OSDs did not came up as the others. The disk did not get
> > mounted and the OSD processes did nothing … even after a couple of
> > minutes no more disks/OSDs showed up.
> >
> > So I did a ceph-disk activate-all.
> >
> > And all missing OSDs got back online.
> >
> > Questions: Any hints on debugging why the disk did not get online after
> > the reboot?
>
> We've been seeing this on our Ubuntu / Jewel cluster, after we upgraded
> from ceph 10.2.3 / kernel 4.4.0-62 to ceph 10.2.7 / kernel 4.4.0-93.
>
> I'm still digging, but AFAICT it's a race condition in startup - in our
> case, we're only seeing it if some of the filesystems aren't clean. This
> may be related to the thread "Very slow start of osds after reboot" from
> August, but I don't think any conclusion was reached there.
>
> Regards,
>
> Matthew
>
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to