Priority backfill/recovery based on the degradation level of the PG

GuangYang Mon, 14 Sep 2015 14:30:42 -0700

Hi Sam,
We discussed this briefly on IRC, I think it might be better to recap with an 
email.


Currently we schedule the backfill/recovery based on how degrade the PG is, 
with a factor distinguishing recovery vs. backfill (recovery always has higher 
priority). The way to calculate the degradation level of a PG is: 
{expected_pool_size} - {acting_set_size}. I think there are two issues with the 
current approach:

1. The current {acting_size_size} might not capture the degradation level over 
the past intervals. For example, we have two PGs (Erasure Coding with 8 data 
and 3 parity chunks) 1.0 and 1.1:
     1.1 At t1, PG 1.0's acting set size becomes 8 while PG 1.1's acting set is 
11
     1.2 At t2, PG 1.1's acting set size becomes 10 while PG 1.1's acting set 
is 9
     1.3 At t3, we start recovering (e.g. mark out some OSDs)
With the current algorithm, PG 1.1 will recovery first and then PG 1.0 (if the 
concurrency is configured as 1), however, from a data durability's perspective, 
the data written between t1 and t2 are more degraded and risky.

2. The algorithm does not take EC/replication into account (and EC profile), 
which might be also important go consider the data durability.

Is my understanding correct here?

Thanks,
Guang                                     --
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Priority backfill/recovery based on the degradation level of the PG

Reply via email to