For the past few weeks I've been seeing a large number of pgs on our
main erasure coded pool being flagged inconsistent, followed by them
becoming active+recovery_wait+inconsistent with unfound objects. The
cluster is currently running luminous 12.2.2 but has in the past also
run its way through firefly, hammer and jewel.
Here's a sample object from "ceph pg list_missing" (there are 150
unfound objects in this particular pg):
ceph health detail shows:
pg 70.467 is stuck unclean for 1004525.715896, current state
active+recovery_wait+inconsistent, last acting [449,233,336,323,259,193]
ceph pg 70.467 list_missing:
When I trace through the filesystem on each OSD, I find the associated
file present on each OSD but with size 0 bytes.
Interestingly, for the 3 OSDs for which "list_missing" shows locations
above (193,259,449), the timestamp of the 0-byte file is recent (within
last few weeks). For the other 3 OSDs (233,336,323), it's in the distant
past (08/2015 and 02/2016).
All the unfound objects I've checked on this pg show the same pattern,
along with the "have" epoch showing as "0'0".
Other than the potential data loss being disturbing, I wonder why this
showed up so suddenly?
It seems to have been triggered by one OSD host failing over a long
weekend. By the time we looked at it on Monday, the cluster had
re-balanced enough data that I decided to simply leave it - we had long
wanted to evacuate a first host to convert to a newer OS release, as
well as Bluestore. Perhaps this was a bad choice, but the cluster
recovery appeared to be proceeding normally, and was apparently complete
a few days later. It was only around a week later that the unfound
All the unfound object file fragments I've tracked down so far have
their older members with timestamps in the same mid-2015 to mid-2016
period. I could be wrong but this really seems like a long-standing
problem has just been unearthed. I wonder if it could be connected to
this thread from early 2016, concerning a problem on the same cluster:
It's a long thread, but the 0-byte files sound very like the "orphaned
files" in that thread - related to performing a directory split while
handling links on a filename with the special long filename handling...
However unlike that thread, I'm not finding any other files with
duplicate names in the hierarchy.
I'm not sure there's much else I can do besides record the names of any
unfound objects before resorting to "mark_unfound_lost delete" - any
suggestions for further research?
Minnesota Supercomputing Institute - g...@umn.edu
ceph-users mailing list