Hi,
I am facing a strange behaviour where a pool is stucked, I have no idea
how this pool appear in the cluster in the way I have not played with
pool creation, *yet*.
##### root@node1:~# ceph -s
cluster 1b147882-722c-43d8-8dfb-38b78d9fbec3
health HEALTH_WARN 333 pgs degraded; 333 pgs stuck unclean; pool
.rgw.buckets has too few pgs
monmap e1: 1 mons at {node1=127.0.0.1:6789/0}, election epoch 1,
quorum 0 node1
osdmap e154: 3 osds: 3 up, 3 in
pgmap v16812: 3855 pgs, 14 pools, 41193 MB data, 24792 objects
57236 MB used, 644 GB / 738 GB avail
3522 active+clean
333 active+degraded
##### root@node1:/etc/ceph# ceph osd dump
epoch 154
fsid 1b147882-722c-43d8-8dfb-38b78d9fbec3
created 2014-04-16 20:46:46.516403
modified 2014-04-18 12:14:29.052231
flags
pool 0 'data' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 1 min_size 1 crush_ruleset 1 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0
pool 2 'rbd' rep size 1 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0
pool 3 '.rgw.root' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 16 owner 0
pool 4 '.rgw.control' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 18 owner 0
pool 5 '.rgw' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 333 pgp_num 333 last_change 20 owner 0
pool 6 '.rgw.gc' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 21 owner 0
pool 7 '.users.uid' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 22 owner 0
pool 8 '.users' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 26 owner 0
pool 9 '.users.swift' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 28 owner 0
pool 10 '.users.email' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 56 owner 0
pool 11 '.rgw.buckets.index' rep size 1 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 333 pgp_num 333 last_change 58 owner
18446744073709551615
pool 12 '.rgw.buckets' rep size 1 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 60 owner 18446744073709551615
pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 333 pgp_num 333 last_change 146 owner 18446744073709551615
max_osd 5
osd.0 up in weight 1 up_from 151 up_thru 151 down_at 148
last_clean_interval [144,147) 192.168.1.18:6800/26681
192.168.1.18:6801/26681 192.168.1.18:6802/26681 192.168.1.18:6803/26681
exists,up f6f63e8a-42af-4dda-b523-ffb835165420
osd.1 up in weight 1 up_from 149 up_thru 149 down_at 148
last_clean_interval [139,147) 192.168.1.18:6805/26685
192.168.1.18:6806/26685 192.168.1.18:6807/26685 192.168.1.18:6808/26685
exists,up fa4689ac-e0ca-4ec3-ab2a-6afa57cc7498
osd.2 up in weight 1 up_from 153 up_thru 153 down_at 148
last_clean_interval [141,147) 192.168.1.18:6810/26691
192.168.1.18:6811/26691 192.168.1.18:6812/26691 192.168.1.18:6813/26691
exists,up 6b2f7e3f-619c-4922-bdf9-bb0f2eee7413
##### root@node1:/etc/ceph# ceph pg dump_stuck unclean |sort
13.0 0 0 0 0 0 0 0 active+degraded 2014-04-18
12:14:28.438523 0'0 154:13 [0] [0] 0'0 2014-04-18
11:12:05.322855 0'0 2014-04-18 11:12:05.322855
13.100 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:26.110633 0'0 154:13 [0] [0] 0'0
2014-04-18 11:12:06.318159 0'0 2014-04-18 11:12:06.318159
13.10 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:37.081087 0'0 154:12 [2] [2] 0'0
2014-04-18 11:12:05.642317 0'0 2014-04-18 11:12:05.642317
13.1 0 0 0 0 0 0 0 active+degraded 2014-04-18
12:14:20.874829 0'0 154:13 [1] [1] 0'0 2014-04-18
11:12:05.580874 0'0 2014-04-18 11:12:05.580874
13.101 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:16.723100 0'0 154:14 [1] [1] 0'0
2014-04-18 11:12:06.540975 0'0 2014-04-18 11:12:06.540975
13.102 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:35.795491 0'0 154:12 [2] [2] 0'0
2014-04-18 11:12:06.543846 0'0 2014-04-18 11:12:06.543846
13.103 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:35.809492 0'0 154:12 [2] [2] 0'0
2014-04-18 11:12:06.561542 0'0 2014-04-18 11:12:06.561542
13.104 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:35.817750 0'0 154:12 [2] [2] 0'0
2014-04-18 11:12:06.569706 0'0 2014-04-18 11:12:06.569706
13.105 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:35.840668 0'0 154:12 [2] [2] 0'0
2014-04-18 11:12:06.602826 0'0 2014-04-18 11:12:06.602826
[...]
13.f7 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:16.990648 0'0 154:14 [1] [1] 0'0
2014-04-18 11:12:06.483859 0'0 2014-04-18 11:12:06.483859
13.f8 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:35.947686 0'0 154:12 [2] [2] 0'0
2014-04-18 11:12:06.481459 0'0 2014-04-18 11:12:06.481459
13.f9 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:35.961392 0'0 154:12 [2] [2] 0'0
2014-04-18 11:12:06.505039 0'0 2014-04-18 11:12:06.505039
13.fa 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:17.062254 0'0 154:14 [1] [1] 0'0
2014-04-18 11:12:06.493605 0'0 2014-04-18 11:12:06.493605
13.fb 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:17.058748 0'0 154:14 [1] [1] 0'0
2014-04-18 11:12:06.526013 0'0 2014-04-18 11:12:06.526013
13.fc 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:26.277414 0'0 154:13 [0] [0] 0'0
2014-04-18 11:12:06.243714 0'0 2014-04-18 11:12:06.243714
13.fd 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:26.312618 0'0 154:13 [0] [0] 0'0
2014-04-18 11:12:06.263824 0'0 2014-04-18 11:12:06.263824
13.fe 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:35.977273 0'0 154:12 [2] [2] 0'0
2014-04-18 11:12:06.511879 0'0 2014-04-18 11:12:06.511879
13.ff 0 0 0 0 0 0 0 active+degraded
2014-04-18 12:14:26.262810 0'0 154:13 [0] [0] 0'0
2014-04-18 11:12:06.289603 0'0 2014-04-18 11:12:06.289603
pg_stat objects mip degr unf bytes log disklog
state state_stamp v reported up acting last_scrub
scrub_stamp last_deep_scrub deep_scrub_stamp
##### root@node1:~# rados df
pool name category KB objects
clones degraded unfound rd rd KB
wr wr KB
- 0 0
0 0 0 0 0
0 0
.rgw - 1 5
0 0 0 31 23
17 6
.rgw.buckets - 42182267 24733
0 0 0 4485 17420 163372
50559394
.rgw.buckets.index - 0 3
0 0 0 47113 105894
44735 0
.rgw.control - 0 8
0 0 0 0 0
0 0
.rgw.gc - 0 32
0 0 0 7114 7704
8524 0
.rgw.root - 1 3
0 0 0 16 10
3 3
.users - 1 2
0 0 0 0 0
2 2
.users.email - 1 1
0 0 0 0 0
1 1
.users.swift - 1 2
0 0 0 5 3
2 2
.users.uid - 1 3
0 0 0 52 46
16 6
data - 0 0
0 0 0 0 0
0 0
metadata - 0 0
0 0 0 0 0
0 0
rbd - 0 0
0 0 0 0 0
0 0
total used 58610648 24792
total avail 676160692
total space 774092940
The pool seams empty, so I have tried to removed it but the command
complain about the empty name. The last modification that have been done
was changing the "osd pool default size" in ceph.conf from 1 to 2 and
restart the whole cluster (mon + osd), AFAICR the cluster was healtly
before doing that.
This is a small bed test so every thing can be trashed, but I am still a
bit curious of what happens and how it could be fixed ?
Cheers
--
Cédric
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com