Re: [ceph-users] Odp.: Odp.: CEPH 1 pgs incomplete

Craig Lewis Wed, 22 Apr 2015 13:16:37 -0700

ceph pg query says all the OSDs are being probed.  If those 6 OSDs are
staying up, it probably just needs some time.  The OSDs need to stay up
longer than 15 mniutes.  If any of them are getting marked down at all,
that'll cause problems.  I'd like to see the past intervals in the recovery
state get smaller.  All of those entries indicate potential history that
needs to be reconciled.  If that array is getting smaller, then recovery is
proceeding.


You could try pushing it a bit with a ceph pg scrub 0.37.  If that finishes
with out any improvement, try ceph pg deep-scrub 0.37 .  Sometimes it helps
move things faster, and sometimes it doesn't.



On Wed, Apr 22, 2015 at 11:54 AM, MEGATEL / Rafał Gawron <
[email protected]> wrote:

>  All osd are works fine now
>  ceph osd tree
>  ID  WEIGHT     TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
>  -1 1080.71985 root default
>  -2  120.07999     host s1
>   0   60.03999         osd.0       up  1.00000          1.00000
>   1   60.03999         osd.1       up  1.00000          1.00000
>  -3  120.07999     host s2
>   2   60.03999         osd.2       up  1.00000          1.00000
>   3   60.03999         osd.3       up  1.00000          1.00000
>  -4  120.07999     host s3
>   4   60.03999         osd.4       up  1.00000          1.00000
>   5   60.03999         osd.5       up  1.00000          1.00000
>  -5  120.07999     host s4
>   6   60.03999         osd.6       up  1.00000          1.00000
>   7   60.03999         osd.7       up  1.00000          1.00000
>  -6  120.07999     host s5
>   9   60.03999         osd.9       up  1.00000          1.00000
>   8   60.03999         osd.8       up  1.00000          1.00000
>  -7  120.07999     host s6
>  10   60.03999         osd.10      up  1.00000          1.00000
>  11   60.03999         osd.11      up  1.00000          1.00000
>   -8  120.07999     host s7
>  12   60.03999         osd.12      up  1.00000          1.00000
>  13   60.03999         osd.13      up  1.00000          1.00000
>   -9  120.07999     host s8
>  14   60.03999         osd.14      up  1.00000          1.00000
>   15   60.03999         osd.15      up  1.00000          1.00000
> -10  120.07999     host s9
>  17   60.03999         osd.17      up  1.00000          1.00000
>  16   60.03999         osd.16      up  1.00000          1.00000
>
>
> Early I had power failure and my cluster was down.
> After up is recovering but now I have :
> 1 pgs incomplete
>  1 pgs stuck inactive
> 1 pgs stuck unclean
>
> Cluster don't can revovery this pg.
> I try out some osd and add to my cluster but recovery after this things
> don't rebuild my cluster.
>
>
>  ------------------------------
> *Od:* Craig Lewis <[email protected]>
> *Wysłane:* 22 kwietnia 2015 20:40
> *Do:* MEGATEL / Rafał Gawron
> *Temat:* Re: Odp.: [ceph-users] CEPH 1 pgs incomplete
>
>  So you have flapping OSDs.  None of the 6 OSDs involved in that PG are
> staying up long enough to complete the recovery.
>
>  What's happened is that because of how quickly the OSDs are coming up
> and failing, no single OSD has a complete copy of the data.  There should
> be a complete copy of the data, but different osds have different chunks of
> it.
>
>  Figure out why those 6 OSDs are failing, and Ceph should recover.  Do
> you see anything interesting in those OSD logs?  If not, you might need to
> increase the logging levels.
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Odp.: Odp.: CEPH 1 pgs incomplete

Reply via email to