On 02/17/2018 12:48 PM, David Zafman wrote:
The commits below came after v12.2.2 and may impact this issue. When a
pg is active+clean+inconsistent means that scrub has detected issues
with 1 or more replicas of 1 or more objects . An unfound object is a
potentially temporary state in which the current set of available OSDs
doesn't allow an object to be recovered/backfilled/repaired. When the
primary OSD restarts, any unfound objects ( an in memory structure) are
reset so that the new set of peered OSDs can determine again what
objects are unfound.
I'm not clear in this scenario whether recovery failed to start,
recovery hung before due to a bug or if recovery stopped (as designed)
because of the unfound object. The new recovery_unfound and
backfill_unfound states indicates that recovery has stopped due to
Thanks for your comments David. I could certainly enable any additional
logging that might help to clarify what's going on here - perhaps on the
primary OSD for a given pg?
I am still having a hard time understanding why these objects repeatedly
get flagged as unfound, when they are downloadable and contain correct
data whenever they are not in this state. It is a 4+2 EC pool, so I
would think it possible to reconstruct any missing EC chunks.
It's an extensive problem; while I have been focusing on examining a
couple of specific pgs, the pool in general is showing 2410 pgs
inconsistent (out of 4096).
Minnesota Supercomputing Institute - g...@umn.edu
ceph-users mailing list