ceph pg query says all the OSDs are being probed. If those 6 OSDs are staying up, it probably just needs some time. The OSDs need to stay up longer than 15 mniutes. If any of them are getting marked down at all, that'll cause problems. I'd like to see the past intervals in the recovery state get smaller. All of those entries indicate potential history that needs to be reconciled. If that array is getting smaller, then recovery is proceeding.
You could try pushing it a bit with a ceph pg scrub 0.37. If that finishes with out any improvement, try ceph pg deep-scrub 0.37 . Sometimes it helps move things faster, and sometimes it doesn't. On Wed, Apr 22, 2015 at 11:54 AM, MEGATEL / Rafał Gawron < [email protected]> wrote: > All osd are works fine now > ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 1080.71985 root default > -2 120.07999 host s1 > 0 60.03999 osd.0 up 1.00000 1.00000 > 1 60.03999 osd.1 up 1.00000 1.00000 > -3 120.07999 host s2 > 2 60.03999 osd.2 up 1.00000 1.00000 > 3 60.03999 osd.3 up 1.00000 1.00000 > -4 120.07999 host s3 > 4 60.03999 osd.4 up 1.00000 1.00000 > 5 60.03999 osd.5 up 1.00000 1.00000 > -5 120.07999 host s4 > 6 60.03999 osd.6 up 1.00000 1.00000 > 7 60.03999 osd.7 up 1.00000 1.00000 > -6 120.07999 host s5 > 9 60.03999 osd.9 up 1.00000 1.00000 > 8 60.03999 osd.8 up 1.00000 1.00000 > -7 120.07999 host s6 > 10 60.03999 osd.10 up 1.00000 1.00000 > 11 60.03999 osd.11 up 1.00000 1.00000 > -8 120.07999 host s7 > 12 60.03999 osd.12 up 1.00000 1.00000 > 13 60.03999 osd.13 up 1.00000 1.00000 > -9 120.07999 host s8 > 14 60.03999 osd.14 up 1.00000 1.00000 > 15 60.03999 osd.15 up 1.00000 1.00000 > -10 120.07999 host s9 > 17 60.03999 osd.17 up 1.00000 1.00000 > 16 60.03999 osd.16 up 1.00000 1.00000 > > > Early I had power failure and my cluster was down. > After up is recovering but now I have : > 1 pgs incomplete > 1 pgs stuck inactive > 1 pgs stuck unclean > > Cluster don't can revovery this pg. > I try out some osd and add to my cluster but recovery after this things > don't rebuild my cluster. > > > ------------------------------ > *Od:* Craig Lewis <[email protected]> > *Wysłane:* 22 kwietnia 2015 20:40 > *Do:* MEGATEL / Rafał Gawron > *Temat:* Re: Odp.: [ceph-users] CEPH 1 pgs incomplete > > So you have flapping OSDs. None of the 6 OSDs involved in that PG are > staying up long enough to complete the recovery. > > What's happened is that because of how quickly the OSDs are coming up > and failing, no single OSD has a complete copy of the data. There should > be a complete copy of the data, but different osds have different chunks of > it. > > Figure out why those 6 OSDs are failing, and Ceph should recover. Do > you see anything interesting in those OSD logs? If not, you might need to > increase the logging levels. >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
