Hi all, I have a technical question about scrub scheduling. I replaced a disk 
and it is back-filling slowly. We have set osd_scrub_during_recovery = true and 
still observe that scrub times continuously increase (number of PGs not 
scrubbed in time is continuously increasing). Investigating the situation it 
looks like any OSD that has a PG in states "backfill_wait" or "backfilling" is 
preventing scrubs to be scheduled on PGs it is a member of. However, it seems 
it is not quite like that.

On the one hand I have never seen a PG in a state like 
"active+scrubbing+remapped+backfilling", so backfilling PGs at least never seem 
to scrub. On the other hand, it seems like more PGs are scrubbed than would be 
eligible if *all* OSDs with a remapped PG on it would refuse scrubs. It looks 
like something in between "only OSDs with a backfilling PG block requests for 
scrub reservations" and "all OSDs with a PG in states backfilling or 
backfill_wait block requests for scrub reservations". Does the position in the 
backfill reservation queue play a role?

If anyone has insight into how scrub reservations are granted and when not in 
the situation of an OSD backfilling that would be great. My naive 
interpretation of "osd_scrub_during_recovery = true" was that scrubs proceed as 
if no backfill was going on. This, however, is clearly not the case. Having an 
answer to my question above would help me a lot to get an idea when things will 
go back to normal.

Thanks a lot and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to