On Wed, Feb 21, 2018 at 2:46 PM Oliver Freyermuth <
> Dear Cephalopodians,
> in a Luminous 12.2.3 cluster with a pool with:
> - 192 Bluestore OSDs total
> - 6 hosts (32 OSDs per host)
> - 2048 total PGs
> - EC profile k=4, m=2
> - CRUSH failure domain = host
> which results in 2048*6/192 = 64 PGs per OSD on average, I run into issues
> with PG overdose protection.
> In case I reinstall one OSD host (zapping all disks), and recreate the
> OSDs one by one with ceph-volume,
> they will usually come back "slowly", i.e. one after the other.
> This means the first OSD will initially be assigned all 2048 PGs (to
> fulfill the "failure domain host" requirement),
> thus breaking through the default osd_max_pg_per_osd_hard_ratio of 2.
> We also use mon_max_pg_per_osd default, i.e. 200.
> This appears to cause the previously active (but of course
> undersized+degraded) PGs to enter an "activating+remapped" state,
> and hence they become unavailable.
> Thus, data availability is reduced. All this is caused by adding an OSD!
> Of course, as more and more OSDs are added until all 32 are back online,
> this situation is relaxed.
> Still, I observe that some PGs get stuck in this "activating" state, and
> can't seem to figure out from logs or by dumping them
> what's the actual reason. Waiting does not help, PGs stay "activating",
> data stays inaccessible.
Can you upload logs from each of the OSDs that are (and should be, but
aren't) involved with one of the PGs that happens to? (ceph-post-file) And
create a ticket about it?
Once you have a good map, all the PGs should definitely activate themselves.
> Waiting a bit and manually restarting the ceph-OSD-services on the
> reinstalled host seems to bring them back.
> Also, adjusting osd_max_pg_per_osd_hard_ratio to something large (e.g. 10)
> appears to prevent the issue.
> So my best guess is that this is related to PG overdose protection.
> Any ideas on how to best overcome this / similar observations?
> It would be nice to be able to reinstall an OSD host without temporarily
> making data unavailable,
> right now the only thing which comes to my mind is to effectively disable
> PG overdose protection.
> ceph-users mailing list
ceph-users mailing list