I understood the mechanism more through your answer.
I'm using erasure coding and backfilling step took quite a long time :(
If there was just a lot of pg peering. I think it's reasonable. but I was 
curious why there was a lot of backfill_wait instead of peering.
(e.g. pg 9.5a is stuck undersized for 39h, current state 
active+undersized+degraded+remapped+backfill_wait )

let me know if you have the tips to increase the performance of backfill or 
prevent unnecessary backfill.
Thank you for your answer.

Joshua Baergen wrote:
> Hi Jaemin,
> 
> It is normal for PGs to become degraded during a host reboot, since a
> copy of the data was taken offline and needs to be resynchronized
> after the host comes back. Normally this is quick, as the recovery
> mechanism only needs to modify those objects that have changed while
> the host is down.
> 
> However, if you have backfills ongoing and reboot a host that contains
> OSDs involved in those backfills, then those backfills become
> degraded, and you will need to wait for them to complete for
> degradation to clear. Do you know if you had backfills at the time the
> host was rebooted? If so, the way to avoid this is to wait for
> backfill to complete before taking any OSDs/hosts down for
> maintenance.
> 
> Josh
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to