You're running 0.87-6. There were various fixes for this problem in Firefly. Were any of these snapshots created on early version of Firefly?
So far, every fix for this issue has gotten developers involved. I'd see if you can talk to some devs on IRC, or post to the ceph-devel mailing list. My own experience is that I had to delete the affected PGs, and force create them. Hopefully there's a better answer now. On Fri, Nov 7, 2014 at 8:10 PM, Chu Duc Minh <[email protected]> wrote: > One of my OSDs have problems and can NOT be start. I tried to start many > times but it always crash few minutes after start. > I think about two reasons to make it crash: > 1. A read/write request to this OSD, but due to the corrupted > volume/snapshot/parent-image/..., it crash. > 2. The recovering process can NOT work properly due to the corrupted > volumes/snapshot/parent-image/... > > After many retry and check log, i guess the reason (2) is the main cause. > Because if (1) is the main cause, other OSDs (contain buggy > volume/snapshot) will crash too. > > State of my ceph cluster (just few seconds before crash time): > > 111/57706299 objects degraded (0.001%) > 14918 active+clean > 1 active+clean+scrubbing+deep > 52 active+recovery_wait+degraded > 2 active+recovering+degraded > > > PS: i attach crash-dump log of that OSD in this email for your information. > > Thank you! > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
