Well i let it run with flags nodown and it looked like it would finish BUT
it all went wrong somewhere:
This is now the state:
health: HEALTH_ERR
nodown flag(s) set
5602396/94833780 objects misplaced (5.908%)
Reduced data availability: 143 pgs inactive, 142 pgs peering, 7
pgs stale
Degraded data redundancy: 248859/94833780 objects degraded
(0.262%), 194 pgs unclean, 21 pgs degraded, 12 pgs undersized
11 stuck requests are blocked > 4096 sec
pgs: 13.965% pgs not active
248859/94833780 objects degraded (0.262%)
5602396/94833780 objects misplaced (5.908%)
830 active+clean
75 remapped+peering
66 peering
26 active+remapped+backfill_wait
6 active+undersized+degraded+remapped+backfill_wait
6 active+recovery_wait+degraded+remapped
3 active+undersized+degraded+remapped+backfilling
3 stale+active+undersized+degraded+remapped+backfill_wait
3 stale+active+remapped+backfill_wait
2 active+recovery_wait+degraded
2 active+remapped+backfilling
1 activating+degraded+remapped
1 stale+remapped+peering
#ceph health detail shows:
REQUEST_STUCK 11 stuck requests are blocked > 4096 sec
11 ops are blocked > 16777.2 sec
osds 4,7,23,24 have stuck requests > 16777.2 sec
So what happened and what should i do now?
Thank you very much for any help
Kind regards,
Caspar
2018-06-07 13:33 GMT+02:00 Sage Weil <[email protected]>:
> On Wed, 6 Jun 2018, Caspar Smit wrote:
> > Hi all,
> >
> > We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a
> node
> > to it.
> >
> > osd-max-backfills is at the default 1 so backfilling didn't go very fast
> > but that doesn't matter.
> >
> > Once it started backfilling everything looked ok:
> >
> > ~300 pgs in backfill_wait
> > ~10 pgs backfilling (~number of new osd's)
> >
> > But i noticed the degraded objects increasing a lot. I presume a pg that
> is
> > in backfill_wait state doesn't accept any new writes anymore? Hence
> > increasing the degraded objects?
> >
> > So far so good, but once a while i noticed a random OSD flapping (they
> come
> > back up automatically). This isn't because the disk is saturated but a
> > driver/controller/kernel incompatibility which 'hangs' the disk for a
> short
> > time (scsi abort_task error in syslog). Investigating further i noticed
> > this was already the case before the node expansion.
> >
> > These OSD's flapping results in lots of pg states which are a bit
> worrying:
> >
> > 109 active+remapped+backfill_wait
> > 80 active+undersized+degraded+remapped+backfill_wait
> > 51 active+recovery_wait+degraded+remapped
> > 41 active+recovery_wait+degraded
> > 27 active+recovery_wait+undersized+degraded+remapped
> > 14 active+undersized+remapped+backfill_wait
> > 4 active+undersized+degraded+remapped+backfilling
> >
> > I think the recovery_wait is more important then the backfill_wait, so i
> > like to prioritize these because the recovery_wait was triggered by the
> > flapping OSD's
>
> Just a note: this is fixed in mimic. Previously, we would choose the
> highest-priority PG to start recovery on at the time, but once recovery
> had started, the appearance of a new PG with a higher priority (e.g.,
> because it finished peering after the others) wouldn't preempt/cancel the
> other PG's recovery, so you would get behavior like the above.
>
> Mimic implements that preemption, so you should not see behavior like
> this. (If you do, then the function that assigns a priority score to a
> PG needs to be tweaked.)
>
> sage
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com