Hi all,
I recently encountered a situation where some partially removed OSDs caused my cluster to enter a "stuck inactive" state. The eventually solution was to tell ceph the OSDs were "lost". Because all the PGs were replicated elsewhere on the cluster, no data was lost.

Would it make sense or be possible for Ceph to automatically detect this situation ("stuck inactive" and PGs replicated elsewhere) and automatically take action to un-stuck the cluster? E.g. automatically mark the OSD as lost or cause the OSD be down and out to have the same effect?

  Ideally anything that can be safely automated should be.  :)

Thanks!
C.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to