Hi Dan, saw the pull request, and can confirm your observations, at least partially. Comments inline.
On Thu, Sep 18, 2014 at 2:50 PM, Dan Van Der Ster <[email protected]> wrote: >>> Do I understand your issue report correctly in that you have found >>> setting osd_snap_trim_sleep to be ineffective, because it's being >>> applied when iterating from PG to PG, rather than from snap to snap? >>> If so, then I'm guessing that that can hardly be intentional… > > > I’m beginning to agree with you on that guess. AFAICT, the normal behavior of > the snap trimmer is to trim one single snap, the one which is in the > snap_trimq but not yet in purged_snaps. So the only time the current sleep > implementation could be useful is if we rm’d a snap across many PGs at once, > e.g. rm a pool snap or an rbd snap. But those aren’t a huge problem anyway > since you’d at most need to trim O(100) PGs. Hmm. I'm actually seeing this in a system where the problematic snaps could *only* have been RBD snaps. > We could move the snap trim sleep into the SnapTrimmer state machine, for > example in ReplicatedPG::NotTrimming::react. This should allow other IOs to > get through to the OSD, but of course the trimming PG would remain locked. > And it would be locked for even longer now due to the sleep. > > To solve that we could limit the number of trims per instance of the > SnapTrimmer, like I’ve done in this pull req: > https://github.com/ceph/ceph/pull/2516 > Breaking out of the trimmer like that should allow IOs to the trimming PG to > get through. > > The second aspect of this issue is why are the purged_snaps being lost to > begin with. I’ve managed to reproduce that on my test cluster. All you have > to do is create many pool snaps (e.g. of a nearly empty pool), then rmsnap > all those snapshots. Then use crush reweight to move the PGs around. With > debug_osd>=10, you will see "adding snap 1 to purged_snaps”, which is one > signature of this lost purged_snaps issue. To reproduce slow requests the > number of snaps purged needs to be O(10000). Hmmm, I'm not sure if I confirm that. I see "adding snap X to purged_snaps", but only after the snap has been purged. See https://gist.github.com/fghaas/88db3cd548983a92aa35. Of course, the fact that the OSD tries to trim a snap only to get an ENOENT is probably indicative of something being fishy with the snaptrimq and/or the purged_snaps list as well. > Looking forward to any ideas someone might have. So am I. :) Cheers, Florian -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
