> > If I create an empty cluster, set a bigger timeout with :
> > ceph config set global mgr/cephadm/default_cephadm_command_timeout 1800
> >
> > Then apply my OSD spec, it works.
> >
> > When I tried that last time, I change the timeout during cephadm
> > execution, so not all cephadm commands used it, and the error
> > reporting the timeout certainly show the new timeout (1800)...
> >
> > What I've seen that certainly could be improved is the non-atomic,
> > even non serialized operations that leads to :
> > 1- all OSD are created (OSD as in `ceph osd ls`
> > 2- devices are created (PV/VG/LV/dmcrypt)
> > 3- then one OSD by one, the folder /var/lib/ceph/FSID/osd.ID with
> > block and block.db links
> > 4- and the systemd unit is created and started
> >
> > The timeout happens during the step 3.
>
>
> This has also already been brought up, not sure if here or on Slack
> though. It seems like one of the suboptimal default settings that work
> for most use cases, but not all. Maybe a note in the docs could
> suffice to increase the timeout when the operator intends to deploy
> many OSDs per node at once. Not sure if adding one more option during
> bootstrap is worth the hastle.

That timeout seems a bit misplaced. If it runs the above loop and is
making progress, ie, new OSDs coming online, then it should reset the
timer or something, since the value 900 is just taken out of thin air
on some design or some hw that is fast enough to not take 900s and not
some other metric. Timeouts should reflect how much patience we have
when it is NOT making progress.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to