[ceph-users] Reduced data availability: 4 pgs inactive, 4 pgs incomplete

Brent Kennedy Thu, 04 Jan 2018 21:57:41 -0800

We have upgraded from Hammer to Jewel and then Luminous 12.2.2 as of today.
During the hammer upgrade to Jewel we lost two host servers and let the
cluster rebalance/recover, it ran out of space and stalled.  We then added
three new host servers and then let the cluster rebalance/recover. During
that process, at some point, we ended up with 4 pgs not being able to be
repaired using "ceph pg repair xx.xx".  I tried using ceph pg 11.720 query
and from what I can tell the missing information matches, but is being
blocked from being marked clean.  I keep seeing references to the
ceph-object-store tool to use as an export/restore method, but I cannot find
details on a step by step process given the current predicament.  It may
also be possible for us to just lose the data if it cant be extracted so we
can at least return the cluster to a healthy state.  Any thoughts?


Ceph -s output:

 

cluster:

    health: HEALTH_ERR

            Reduced data availability: 4 pgs inactive, 4 pgs incomplete

            Degraded data redundancy: 4 pgs unclean

            4 stuck requests are blocked > 4096 sec

            too many PGs per OSD (2549 > max 200)

 

  services:

    mon: 3 daemons, quorum ukpixmon1,ukpixmon2,ukpixmon3

    mgr: ukpixmon1(active), standbys: ukpixmon3, ukpixmon2

    osd: 43 osds: 43 up, 43 in

    rgw: 3 daemons active

 

  data:

    pools:   12 pools, 37904 pgs

    objects: 8148k objects, 10486 GB

    usage:   21530 GB used, 135 TB / 156 TB avail

    pgs:     0.011% pgs not active

             37900 active+clean

             4     incomplete

 

OSD TREE output:

 

ID CLASS WEIGHT    TYPE NAME           STATUS REWEIGHT PRI-AFF

-1       156.10268 root default

-2        32.57996     host osdhost1

0         3.62000         osd.0           up  1.00000 1.00000

1         3.62000         osd.1           up  1.00000 1.00000

2         3.62000         osd.2           up  1.00000 1.00000

3         3.62000         osd.3           up  1.00000 1.00000

4         3.62000         osd.4           up  1.00000 1.00000

5         3.62000         osd.5           up  1.00000 1.00000

6         3.62000         osd.6           up  1.00000 1.00000

7         3.62000         osd.7           up  1.00000 1.00000

8         3.62000         osd.8           up  1.00000 1.00000

-3        25.33997     host osdhost2

9         3.62000         osd.9           up  1.00000 1.00000

10         3.62000         osd.10          up  1.00000 1.00000

11         3.62000         osd.11          up  1.00000 1.00000

12         3.62000         osd.12          up  1.00000 1.00000

15         3.62000         osd.15          up  1.00000 1.00000

16         3.62000         osd.16          up  1.00000 1.00000

17         3.62000         osd.17          up  1.00000 1.00000

-8        32.72758     host osdhost6

14         3.63640         osd.14          up  1.00000 1.00000

21         3.63640         osd.21          up  1.00000 1.00000

23         3.63640         osd.23          up  1.00000 1.00000

26         3.63640         osd.26          up  1.00000 1.00000

32         3.63640         osd.32          up  1.00000 1.00000

33         3.63640         osd.33          up  1.00000 1.00000

34         3.63640         osd.34          up  1.00000 1.00000

35         3.63640         osd.35          up  1.00000 1.00000

36         3.63640         osd.36          up  1.00000 1.00000

-9        32.72758     host osdhost7

19         3.63640         osd.19          up  1.00000 1.00000

37         3.63640         osd.37          up  1.00000 1.00000

38         3.63640         osd.38          up  1.00000 1.00000

39         3.63640         osd.39          up  1.00000 1.00000

40         3.63640         osd.40          up  1.00000 1.00000

41         3.63640         osd.41          up  1.00000 1.00000

42         3.63640         osd.42          up  1.00000 1.00000

43         3.63640         osd.43          up  1.00000 1.00000

44         3.63640         osd.44          up  1.00000 1.00000

-7        32.72758     host osdhost8

20         3.63640         osd.20          up  1.00000 1.00000

45         3.63640         osd.45          up  1.00000 1.00000

46         3.63640         osd.46          up  1.00000 1.00000

47         3.63640         osd.47          up  1.00000 1.00000

48         3.63640         osd.48          up  1.00000 1.00000

49         3.63640         osd.49          up  1.00000 1.00000

50         3.63640         osd.50          up  1.00000 1.00000

51         3.63640         osd.51          up  1.00000 1.00000

52         3.63640         osd.52          up  1.00000 1.00000

 

Ceph health detail output:

HEALTH_ERR Reduced data availability: 4 pgs inactive, 4 pgs incomplete;
Degraded data redundancy: 4 pgs unclean; 4 stuck requests are blocked > 4096
sec; too many PGs per OSD (2549 > max 200)

PG_AVAILABILITY Reduced data availability: 4 pgs inactive, 4 pgs incomplete

    pg 11.720 is incomplete, acting [21,10]

    pg 11.9ab is incomplete, acting [14,2]

    pg 11.9fb is incomplete, acting [32,43]

    pg 11.c13 is incomplete, acting [42,26]

PG_DEGRADED Degraded data redundancy: 4 pgs unclean

    pg 11.720 is stuck unclean since forever, current state incomplete, last
acting [21,10]

    pg 11.9ab is stuck unclean since forever, current state incomplete, last
acting [14,2]

    pg 11.9fb is stuck unclean since forever, current state incomplete, last
acting [32,43]

    pg 11.c13 is stuck unclean since forever, current state incomplete, last
acting [42,26]

REQUEST_STUCK 4 stuck requests are blocked > 4096 sec

    4 ops are blocked > 33554.4 sec

    osds 21,26,32,42 have stuck requests > 33554.4 sec

TOO_MANY_PGS too many PGs per OSD (2549 > max 200)

 

-Brent

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Reduced data availability: 4 pgs inactive, 4 pgs incomplete

Reply via email to