Hi all,

How do I get my Ceph Cluster back to a healthy state?

root@ceph-admin-storage:~# ceph -v
ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
root@ceph-admin-storage:~# ceph -s
    cluster 6b481875-8be5-4508-b075-e1f660fd7b33
     health HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck 
unclean
     monmap e2: 3 mons at 
{ceph-1-storage=10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0},
 election epoch 5010, quorum 0,1,2 ceph-1-storage,ceph-2-storage,ceph-3-storage
     osdmap e30748: 55 osds: 55 up, 55 in
      pgmap v10800465: 6144 pgs, 3 pools, 11002 GB data, 2762 kobjects
            22077 GB used, 79933 GB / 102010 GB avail
                6138 active+clean
                   4 incomplete
                   2 active+clean+replay
root@ceph-admin-storage:~# ceph health detail
HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean
pg 2.92 is stuck inactive since forever, current state incomplete, last acting 
[8,13]
pg 2.c1 is stuck inactive since forever, current state incomplete, last acting 
[13,7]
pg 2.e3 is stuck inactive since forever, current state incomplete, last acting 
[20,7]
pg 2.587 is stuck inactive since forever, current state incomplete, last acting 
[13,5]
pg 2.92 is stuck unclean since forever, current state incomplete, last acting 
[8,13]
pg 2.c1 is stuck unclean since forever, current state incomplete, last acting 
[13,7]
pg 2.e3 is stuck unclean since forever, current state incomplete, last acting 
[20,7]
pg 2.587 is stuck unclean since forever, current state incomplete, last acting 
[13,5]
pg 2.587 is incomplete, acting [13,5]
pg 2.e3 is incomplete, acting [20,7]
pg 2.c1 is incomplete, acting [13,7]
pg 2.92 is incomplete, acting [8,13]
root@ceph-admin-storage:~# ceph pg dump_stuck inactive
ok
pg_stat    objects    mip    degr    unf    bytes    log    disklog    state    
state_stamp    v    reported    up    up_primary    acting    acting_primary    
last_scrub    scrub_stamp    last_deep_scrub    deep_scrub_stamp
2.92    0    0    0    0    0    0    0    incomplete    2014-08-08 
12:39:20.204592    0'0    30748:7729    [8,13]    8    [8,13]    8    
13503'1390419    2014-06-26 01:57:48.727625    13503'1390419    2014-06-22 
01:57:30.114186
2.c1    0    0    0    0    0    0    0    incomplete    2014-08-08 
12:39:18.846542    0'0    30748:7117    [13,7]    13    [13,7]    13    
13503'1687017    2014-06-26 20:52:51.249864    13503'1687017    2014-06-22 
14:24:22.633554
2.e3    0    0    0    0    0    0    0    incomplete    2014-08-08 
12:39:29.311552    0'0    30748:8027    [20,7]    20    [20,7]    20    
13503'1398727    2014-06-26 07:03:25.899254    13503'1398727    2014-06-21 
07:02:31.393053
2.587    0    0    0    0    0    0    0    incomplete    2014-08-08 
12:39:19.715724    0'0    30748:7060    [13,5]    13    [13,5]    13    
13646'1542934    2014-06-26 07:48:42.089935    13646'1542934    2014-06-22 
07:46:20.363695
root@ceph-admin-storage:~# ceph osd tree
# id    weight    type name    up/down    reweight
-1    99.7    root default
-8    51.06        room room0
-2    19.33            host ceph-1-storage
0    0.91                osd.0    up    1
2    0.91                osd.2    up    1
3    0.91                osd.3    up    1
4    1.82                osd.4    up    1
9    1.36                osd.9    up    1
11    0.68                osd.11    up    1
6    3.64                osd.6    up    1
5    1.82                osd.5    up    1
7    3.64                osd.7    up    1
8    3.64                osd.8    up    1
-3    20            host ceph-2-storage
14    3.64                osd.14    up    1
18    1.36                osd.18    up    1
19    1.36                osd.19    up    1
15    3.64                osd.15    up    1
1    3.64                osd.1    up    1
12    3.64                osd.12    up    1
22    0.68                osd.22    up    1
23    0.68                osd.23    up    1
26    0.68                osd.26    up    1
36    0.68                osd.36    up    1
-4    11.73            host ceph-5-storage
32    0.27                osd.32    up    1
37    0.27                osd.37    up    1
42    0.27                osd.42    up    1
43    1.82                osd.43    up    1
44    1.82                osd.44    up    1
45    1.82                osd.45    up    1
46    1.82                osd.46    up    1
47    1.82                osd.47    up    1
48    1.82                osd.48    up    1
-9    48.64        room room1
-5    15.92            host ceph-3-storage
24    1.82                osd.24    up    1
25    1.82                osd.25    up    1
29    1.36                osd.29    up    1
10    3.64                osd.10    up    1
13    3.64                osd.13    up    1
20    3.64                osd.20    up    1
-6    20            host ceph-4-storage
34    3.64                osd.34    up    1
38    1.36                osd.38    up    1
39    1.36                osd.39    up    1
16    3.64                osd.16    up    1
30    0.68                osd.30    up    1
35    3.64                osd.35    up    1
17    3.64                osd.17    up    1
28    0.68                osd.28    up    1
31    0.68                osd.31    up    1
33    0.68                osd.33    up    1
-7    12.72            host ceph-6-storage
49    0.45                osd.49    up    1
50    0.45                osd.50    up    1
51    0.45                osd.51    up    1
52    0.45                osd.52    up    1
53    1.82                osd.53    up    1
54    1.82                osd.54    up    1
55    1.82                osd.55    up    1
56    1.82                osd.56    up    1
57    1.82                osd.57    up    1
58    1.82                osd.58    up    1

What I have tried so far:
ceph pg repair 2.587 [2.e3 2.c1 2.92]
ceph pg force_create_pg 2.587 [2.e3 2.c1 2.92]
ceph osd lost 5 --yes-i-really-mean-it [7 8 13 20]

The history in brief:
I installed Cuttlefish and updated to Dumpling and to Emperor. The Cluster was 
healthy. Maybe I made ​​a mistake during repair of 8 broken osds, but from then 
on I had incomplete pgs. At last I have updated from Emperor to Firefly.

Regards,
Mike
--------------------------------------------------------------------------------------------------
Bayerischer Rundfunk; Rundfunkplatz 1; 80335 München
Telefon: +49 89 590001; E-Mail: [email protected]; Website: http://www.BR.de
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to