Hi Sage,

Would be nice to have this one backported to Luminous if easy. 

Cheers,
Frédéric.

> Le 7 juin 2018 à 13:33, Sage Weil <[email protected]> a écrit :
> 
> On Wed, 6 Jun 2018, Caspar Smit wrote:
>> Hi all,
>> 
>> We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a node
>> to it.
>> 
>> osd-max-backfills is at the default 1 so backfilling didn't go very fast
>> but that doesn't matter.
>> 
>> Once it started backfilling everything looked ok:
>> 
>> ~300 pgs in backfill_wait
>> ~10 pgs backfilling (~number of new osd's)
>> 
>> But i noticed the degraded objects increasing a lot. I presume a pg that is
>> in backfill_wait state doesn't accept any new writes anymore? Hence
>> increasing the degraded objects?
>> 
>> So far so good, but once a while i noticed a random OSD flapping (they come
>> back up automatically). This isn't because the disk is saturated but a
>> driver/controller/kernel incompatibility which 'hangs' the disk for a short
>> time (scsi abort_task error in syslog). Investigating further i noticed
>> this was already the case before the node expansion.
>> 
>> These OSD's flapping results in lots of pg states which are a bit worrying:
>> 
>>             109 active+remapped+backfill_wait
>>             80  active+undersized+degraded+remapped+backfill_wait
>>             51  active+recovery_wait+degraded+remapped
>>             41  active+recovery_wait+degraded
>>             27  active+recovery_wait+undersized+degraded+remapped
>>             14  active+undersized+remapped+backfill_wait
>>             4   active+undersized+degraded+remapped+backfilling
>> 
>> I think the recovery_wait is more important then the backfill_wait, so i
>> like to prioritize these because the recovery_wait was triggered by the
>> flapping OSD's
> 
> Just a note: this is fixed in mimic.  Previously, we would choose the 
> highest-priority PG to start recovery on at the time, but once recovery 
> had started, the appearance of a new PG with a higher priority (e.g., 
> because it finished peering after the others) wouldn't preempt/cancel the 
> other PG's recovery, so you would get behavior like the above.
> 
> Mimic implements that preemption, so you should not see behavior like 
> this.  (If you do, then the function that assigns a priority score to a 
> PG needs to be tweaked.)
> 
> sage
> _______________________________________________
> ceph-users mailing list
> [email protected] <mailto:[email protected]>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to