Hi all,
How do I get my Ceph Cluster back to a healthy state?
root@ceph-admin-storage:~# ceph -v
ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
root@ceph-admin-storage:~# ceph -s
cluster 6b481875-8be5-4508-b075-e1f660fd7b33
health HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck
unclean
monmap e2: 3 mons at
{ceph-1-storage=10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0},
election epoch 5010, quorum 0,1,2 ceph-1-storage,ceph-2-storage,ceph-3-storage
osdmap e30748: 55 osds: 55 up, 55 in
pgmap v10800465: 6144 pgs, 3 pools, 11002 GB data, 2762 kobjects
22077 GB used, 79933 GB / 102010 GB avail
6138 active+clean
4 incomplete
2 active+clean+replay
root@ceph-admin-storage:~# ceph health detail
HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean
pg 2.92 is stuck inactive since forever, current state incomplete, last acting
[8,13]
pg 2.c1 is stuck inactive since forever, current state incomplete, last acting
[13,7]
pg 2.e3 is stuck inactive since forever, current state incomplete, last acting
[20,7]
pg 2.587 is stuck inactive since forever, current state incomplete, last acting
[13,5]
pg 2.92 is stuck unclean since forever, current state incomplete, last acting
[8,13]
pg 2.c1 is stuck unclean since forever, current state incomplete, last acting
[13,7]
pg 2.e3 is stuck unclean since forever, current state incomplete, last acting
[20,7]
pg 2.587 is stuck unclean since forever, current state incomplete, last acting
[13,5]
pg 2.587 is incomplete, acting [13,5]
pg 2.e3 is incomplete, acting [20,7]
pg 2.c1 is incomplete, acting [13,7]
pg 2.92 is incomplete, acting [8,13]
root@ceph-admin-storage:~# ceph pg dump_stuck inactive
ok
pg_stat objects mip degr unf bytes log disklog state
state_stamp v reported up up_primary acting acting_primary
last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
2.92 0 0 0 0 0 0 0 incomplete 2014-08-08
12:39:20.204592 0'0 30748:7729 [8,13] 8 [8,13] 8
13503'1390419 2014-06-26 01:57:48.727625 13503'1390419 2014-06-22
01:57:30.114186
2.c1 0 0 0 0 0 0 0 incomplete 2014-08-08
12:39:18.846542 0'0 30748:7117 [13,7] 13 [13,7] 13
13503'1687017 2014-06-26 20:52:51.249864 13503'1687017 2014-06-22
14:24:22.633554
2.e3 0 0 0 0 0 0 0 incomplete 2014-08-08
12:39:29.311552 0'0 30748:8027 [20,7] 20 [20,7] 20
13503'1398727 2014-06-26 07:03:25.899254 13503'1398727 2014-06-21
07:02:31.393053
2.587 0 0 0 0 0 0 0 incomplete 2014-08-08
12:39:19.715724 0'0 30748:7060 [13,5] 13 [13,5] 13
13646'1542934 2014-06-26 07:48:42.089935 13646'1542934 2014-06-22
07:46:20.363695
root@ceph-admin-storage:~# ceph osd tree
# id weight type name up/down reweight
-1 99.7 root default
-8 51.06 room room0
-2 19.33 host ceph-1-storage
0 0.91 osd.0 up 1
2 0.91 osd.2 up 1
3 0.91 osd.3 up 1
4 1.82 osd.4 up 1
9 1.36 osd.9 up 1
11 0.68 osd.11 up 1
6 3.64 osd.6 up 1
5 1.82 osd.5 up 1
7 3.64 osd.7 up 1
8 3.64 osd.8 up 1
-3 20 host ceph-2-storage
14 3.64 osd.14 up 1
18 1.36 osd.18 up 1
19 1.36 osd.19 up 1
15 3.64 osd.15 up 1
1 3.64 osd.1 up 1
12 3.64 osd.12 up 1
22 0.68 osd.22 up 1
23 0.68 osd.23 up 1
26 0.68 osd.26 up 1
36 0.68 osd.36 up 1
-4 11.73 host ceph-5-storage
32 0.27 osd.32 up 1
37 0.27 osd.37 up 1
42 0.27 osd.42 up 1
43 1.82 osd.43 up 1
44 1.82 osd.44 up 1
45 1.82 osd.45 up 1
46 1.82 osd.46 up 1
47 1.82 osd.47 up 1
48 1.82 osd.48 up 1
-9 48.64 room room1
-5 15.92 host ceph-3-storage
24 1.82 osd.24 up 1
25 1.82 osd.25 up 1
29 1.36 osd.29 up 1
10 3.64 osd.10 up 1
13 3.64 osd.13 up 1
20 3.64 osd.20 up 1
-6 20 host ceph-4-storage
34 3.64 osd.34 up 1
38 1.36 osd.38 up 1
39 1.36 osd.39 up 1
16 3.64 osd.16 up 1
30 0.68 osd.30 up 1
35 3.64 osd.35 up 1
17 3.64 osd.17 up 1
28 0.68 osd.28 up 1
31 0.68 osd.31 up 1
33 0.68 osd.33 up 1
-7 12.72 host ceph-6-storage
49 0.45 osd.49 up 1
50 0.45 osd.50 up 1
51 0.45 osd.51 up 1
52 0.45 osd.52 up 1
53 1.82 osd.53 up 1
54 1.82 osd.54 up 1
55 1.82 osd.55 up 1
56 1.82 osd.56 up 1
57 1.82 osd.57 up 1
58 1.82 osd.58 up 1
What I have tried so far:
ceph pg repair 2.587 [2.e3 2.c1 2.92]
ceph pg force_create_pg 2.587 [2.e3 2.c1 2.92]
ceph osd lost 5 --yes-i-really-mean-it [7 8 13 20]
The history in brief:
I installed Cuttlefish and updated to Dumpling and to Emperor. The Cluster was
healthy. Maybe I made a mistake during repair of 8 broken osds, but from then
on I had incomplete pgs. At last I have updated from Emperor to Firefly.
Regards,
Mike
--------------------------------------------------------------------------------------------------
Bayerischer Rundfunk; Rundfunkplatz 1; 80335 München
Telefon: +49 89 590001; E-Mail: [email protected]; Website: http://www.BR.de
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com