Dear Cephalopodians, 

in a Luminous 12.2.3 cluster with a pool with:
- 192 Bluestore OSDs total
- 6 hosts (32 OSDs per host)
- 2048 total PGs
- EC profile k=4, m=2
- CRUSH failure domain = host
which results in 2048*6/192 = 64 PGs per OSD on average, I run into issues with 
PG overdose protection. 

In case I reinstall one OSD host (zapping all disks), and recreate the OSDs one 
by one with ceph-volume,
they will usually come back "slowly", i.e. one after the other. 

This means the first OSD will initially be assigned all 2048 PGs (to fulfill 
the "failure domain host" requirement), 
thus breaking through the default osd_max_pg_per_osd_hard_ratio of 2. 
We also use mon_max_pg_per_osd default, i.e. 200. 

This appears to cause the previously active (but of course undersized+degraded) 
PGs to enter an "activating+remapped" state,
and hence they become unavailable. 
Thus, data availability is reduced. All this is caused by adding an OSD! 

Of course, as more and more OSDs are added until all 32 are back online, this 
situation is relaxed. 
Still, I observe that some PGs get stuck in this "activating" state, and can't 
seem to figure out from logs or by dumping them
what's the actual reason. Waiting does not help, PGs stay "activating", data 
stays inaccessible. 

Waiting a bit and manually restarting the ceph-OSD-services on the reinstalled 
host seems to bring them back. 
Also, adjusting osd_max_pg_per_osd_hard_ratio to something large (e.g. 10) 
appears to prevent the issue. 

So my best guess is that this is related to PG overdose protection. 
Any ideas on how to best overcome this / similar observations? 

It would be nice to be able to reinstall an OSD host without temporarily making 
data unavailable,
right now the only thing which comes to my mind is to effectively disable PG 
overdose protection. 

Cheers,
        Oliver

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to