I encountered this same issue on two different clusters running Hammer 0.94.9 
last week. In both cases I was able to resolve it by deleting (moving) all 
replicas of the unexpected clone manually and issuing a pg repair. Which 
version did you see this on? A call stack for the resulting crash would also be 
interesting, although troubleshooting further is probably less valid and less 
valuable now that you've resolved the problem. It's just a matter of curiosity 
at this point.



Steve Taylor | Senior Software Engineer | StorageCraft Technology 
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

On Tue, 2017-08-08 at 12:02 +0200, Stefan Priebe - Profihost AG wrote:

Hello Greg,

Am 08.08.2017 um 11:56 schrieb Gregory Farnum:

On Mon, Aug 7, 2017 at 11:55 PM Stefan Priebe - Profihost AG
<mailto:s.pri...@profihost.ag>> wrote:


    how can i fix this one:

    2017-08-08 08:42:52.265321 osd.20 [ERR] repair 3.61a
    3:58654d3d:::rbd_data.106dd406b8b4567.000000000000018c:9d455 is an
    unexpected clone
    2017-08-08 08:43:04.914640 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
    pgs repair; 1 scrub errors
    2017-08-08 08:43:33.470246 osd.20 [ERR] 3.61a repair 1 errors, 0 fixed
    2017-08-08 08:44:04.915148 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
    scrub errors

    If i just delete manually the relevant files ceph is crashing. rados
    does not list those at all?

    How can i fix this?

You've sent quite a few emails that have this story spread out, and I
think you've tried several different steps to repair it that have been a
bit difficult to track.

It would be helpful if you could put the whole story in one place and
explain very carefully exactly what you saw and how you responded. Stuff
like manually copying around the wrong files, or files without a
matching object info, could have done some very strange things.
Also, basic debugging stuff like what version you're running will help. :)

Also note that since you've said elsewhere you don't need this image, I
don't think it's going to hurt you to leave it like this for a bit
(though it will definitely mess up your monitoring).

i'm sorry about that. You're correct.

I was able to fix this just a few minutes ago by using the
ceph-object-tool and the remove operation to remove all left over files.

I did this on all OSDs with the problematic pg. After that ceph was able
to fix itself.

A better approach might be that ceph can recover itself from an
unexpected clone by just deleting it.

ceph-users mailing list

ceph-users mailing list

Reply via email to