I encountered this same issue on two different clusters running Hammer 0.94.9
last week. In both cases I was able to resolve it by deleting (moving) all
replicas of the unexpected clone manually and issuing a pg repair. Which
version did you see this on? A call stack for the resulting crash would also be
interesting, although troubleshooting further is probably less valid and less
valuable now that you've resolved the problem. It's just a matter of curiosity
at this point.
Steve Taylor | Senior Software Engineer | StorageCraft Technology
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
If you are not the intended recipient of this message or received it
erroneously, please notify the sender and delete it, together with any
attachments, and be advised that any dissemination or copying of this message
On Tue, 2017-08-08 at 12:02 +0200, Stefan Priebe - Profihost AG wrote:
Am 08.08.2017 um 11:56 schrieb Gregory Farnum:
On Mon, Aug 7, 2017 at 11:55 PM Stefan Priebe - Profihost AG
how can i fix this one:
2017-08-08 08:42:52.265321 osd.20 [ERR] repair 3.61a
3:58654d3d:::rbd_data.106dd406b8b4567.000000000000018c:9d455 is an
2017-08-08 08:43:04.914640 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
pgs repair; 1 scrub errors
2017-08-08 08:43:33.470246 osd.20 [ERR] 3.61a repair 1 errors, 0 fixed
2017-08-08 08:44:04.915148 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
If i just delete manually the relevant files ceph is crashing. rados
does not list those at all?
How can i fix this?
You've sent quite a few emails that have this story spread out, and I
think you've tried several different steps to repair it that have been a
bit difficult to track.
It would be helpful if you could put the whole story in one place and
explain very carefully exactly what you saw and how you responded. Stuff
like manually copying around the wrong files, or files without a
matching object info, could have done some very strange things.
Also, basic debugging stuff like what version you're running will help. :)
Also note that since you've said elsewhere you don't need this image, I
don't think it's going to hurt you to leave it like this for a bit
(though it will definitely mess up your monitoring).
i'm sorry about that. You're correct.
I was able to fix this just a few minutes ago by using the
ceph-object-tool and the remove operation to remove all left over files.
I did this on all OSDs with the problematic pg. After that ceph was able
to fix itself.
A better approach might be that ceph can recover itself from an
unexpected clone by just deleting it.
ceph-users mailing list
ceph-users mailing list