[ceph-users] Be careful with primary-temp to balance primaries ...

Stefan Kooman Thu, 20 Apr 2023 07:32:20 -0700

Hi,

A word of caution for Ceph operators out there. Be careful with "cephosd primary-temp" command. TL;DR: with primary_temp active, a CRUSHchange might CRASH your OSDs ... and they won't come back online after arestart (in almost all cases).

The bug is described in this tracker [1], and fixed with this PR [2](thanks Igor!).

The longer story is that we were inspired by the work done on theread-balancer [3] and wondered if we could leverage this on olderclusters. It turned out this is indeed possible by using the"primary-temp" command, instead of "pg-primary-temp" that will beavailable from Reef onward. We compiled a main version of the"osdmaptool", fed it OSD maps from pacific clusters and have itcalculate the optimal primary PG distributions. Then we replaced thepg-primary-temp command with "primary-temp" and applied the commands.That worked as expected. However, we hit a bug [1] in the Ceph code thatcould not handle a situation when there were changes in the CRUSH mapwhen primary_temp where active. We added a new storage node to thecluster that triggered this condition as soon as we put it in the properfailure domain. It tried to make an OSD primary that was not in theactive set anymore, and hence crashed (with a Segmentation Fault mostoften, or aborted). This resulted in multiple (many) OSD crashes acrossthe failure domains and basically took down the whole cluster.

If the Reef / main read-balancer code can suffer from the same bug is asof yet unknown (at least to us). We will try to build a Reef testcluster and find out.

For those of you who want to know how we handled this incident can readthe RFO ([4] in dutch, [5] in English).


Gr. Stefan

[1]: https://tracker.ceph.com/issues/59491?next_issue_id=59490
[2]: https://github.com/ceph/ceph/pull/51160
[3]: https://github.com/ljflores/ceph_read_balancer_2023
[4]: https://www.bit.nl/uploads/images/PDF-Files/RFO-20230314-185335.pdf

[5]: https://www.bit.nl/uploads/images/PDF-Files/2023.04.20%20RFO_CephCluster_185335_EN.pdf

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Be careful with primary-temp to balance primaries ...

Reply via email to