Tracker issue made here with some additional details: https://tracker.ceph.com/issues/71501
Cluster version 18.2.4 I came to assist with a non-functional cluster which had OSDs erroneously --force purged and led to multiple (6) degraded + inactive PGs (4 remaining shards in a 4+2) and 1 remapped+incomplete PG (3 shards in a 4+2) In an effort to restore order to the cluster and get backfill working: the degraded PGs shared a common primary OSD and that OSD was restarted. Additionally min_size was dropped from 5 to 4 for this pool (a temporary measure while the cluster recovered). This caused the inactive degraded PGs to go active and start their backfill. The PGs steadily worked on backfill for a few hours. Near immediately after the final of the 6 degraded PGs finished its backfill the monitor quorum broke and the cluster became unresponsive. In this state 2 of the 5 mons showed 100% cpu usage. In an attempt to fix the mon quorum some combination of monitor service restart attempts occurred and ultimately all ceph services were brought down in order to isolate the MON issue. As the situation stands currently any combination of starting 3 MONs (quorum eligible at 3) causes the lowest MON (the to-be-leader) to hit a 100% cpu with fn_monstore thread and render unresponsive the admin socket of that mon only (the other 2 respond via admin socket and show either probing or electing). I have captured logs of the to-be-leader mon (its daemon started first) with debug_mon = 20 and debug_ms = 20 (I should probably recapture with debug_paxos = 20). The pegged cpu only occurs once the third MON is started (seems to be an election issue). The MON has been allowed to run for hours with no progress in that state. Eventually it seems the leader goes into the (leader) state (Claims "I win" in the logs) but the other 2 mons continue their election cycle. In states of probing or electing still. I have verified NTP clock is fine and sync'd to same source on all the mons and connectivity between the mons at both ports is functional. The situation the PGs were in during the subsequent MON fault leads me to believe the problem is more complex than typical monitor election issues. Also of note the backing disk of the to-be-leader MON is mostly idle. At this point I am interested in taking backups of all the mon stores and injecting a modified single-mon monmap in a mon and seeing if we can get back up but am also concerned that that single mon will be the de-facto leader and also be unresponsive. Interested in any suggestions from the wider community. Thanks! Respectfully, *Wes Dillingham* LinkedIn <http://www.linkedin.com/in/wesleydillingham> w...@wesdillingham.com _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io