Hi Florian,

September 21 2014 3:33 PM, "Florian Haas" <[email protected]> wrote: 
> That said, I'm not sure that wip-9487-dumpling is the final fix to the
> issue. On the system where I am seeing the issue, even with the fix
> deployed, osd's still not only go crazy snap trimming (which by itself
> would be understandable, as the system has indeed recently had
> thousands of snapshots removed), but they also still produce the
> previously seen ENOENT messages indicating they're trying to trim
> snaps that aren't there.
> 

You should be able to tell exactly how many snaps need to be trimmed. Check the 
current purged_snaps with

ceph pg x.y query

and also check the snap_trimq from debug_osd=10. The problem fixed in wip-9487 
is the (mis)communication of purged_snaps to a new OSD. But if in your cluster 
purged_snaps is "correct" (which it should be after the fix from Sage), and it 
still has lots of snaps to trim, then I believe the only thing to do is let 
those snaps all get trimmed. (my other patch linked sometime earlier in this 
thread might help by breaking up all that trimming work into smaller pieces, 
but that was never tested).

Entering the realm of speculation, I wonder if your OSDs are getting 
interrupted, marked down, out, or crashing before they have the opportunity to 
persist purged_snaps? purged_snaps is updated in 
ReplicatedPG::WaitingOnReplicas::react, but if the primary is too busy to 
actually send that transaction to its peers, so then eventually it or the new 
primary needs to start again, and no progress is ever made. If this is what is 
happening on your cluster, then again, perhaps my osd_snap_trim_max patch could 
be a solution.

Cheers, Dan

> That system, however, has PGs marked as recovering, not backfilling as
> in Dan's system. Not sure if wip-9487 falls short of fixing the issue
> at its root. Sage, whenever you have time, would you mind commenting?
> 
> Cheers,
> Florian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to