Hi Sage, Would be nice to have this one backported to Luminous if easy.
Cheers, Frédéric. > Le 7 juin 2018 à 13:33, Sage Weil <[email protected]> a écrit : > > On Wed, 6 Jun 2018, Caspar Smit wrote: >> Hi all, >> >> We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a node >> to it. >> >> osd-max-backfills is at the default 1 so backfilling didn't go very fast >> but that doesn't matter. >> >> Once it started backfilling everything looked ok: >> >> ~300 pgs in backfill_wait >> ~10 pgs backfilling (~number of new osd's) >> >> But i noticed the degraded objects increasing a lot. I presume a pg that is >> in backfill_wait state doesn't accept any new writes anymore? Hence >> increasing the degraded objects? >> >> So far so good, but once a while i noticed a random OSD flapping (they come >> back up automatically). This isn't because the disk is saturated but a >> driver/controller/kernel incompatibility which 'hangs' the disk for a short >> time (scsi abort_task error in syslog). Investigating further i noticed >> this was already the case before the node expansion. >> >> These OSD's flapping results in lots of pg states which are a bit worrying: >> >> 109 active+remapped+backfill_wait >> 80 active+undersized+degraded+remapped+backfill_wait >> 51 active+recovery_wait+degraded+remapped >> 41 active+recovery_wait+degraded >> 27 active+recovery_wait+undersized+degraded+remapped >> 14 active+undersized+remapped+backfill_wait >> 4 active+undersized+degraded+remapped+backfilling >> >> I think the recovery_wait is more important then the backfill_wait, so i >> like to prioritize these because the recovery_wait was triggered by the >> flapping OSD's > > Just a note: this is fixed in mimic. Previously, we would choose the > highest-priority PG to start recovery on at the time, but once recovery > had started, the appearance of a new PG with a higher priority (e.g., > because it finished peering after the others) wouldn't preempt/cancel the > other PG's recovery, so you would get behavior like the above. > > Mimic implements that preemption, so you should not see behavior like > this. (If you do, then the function that assigns a priority score to a > PG needs to be tweaked.) > > sage > _______________________________________________ > ceph-users mailing list > [email protected] <mailto:[email protected]> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
