Hello,
being in latest Hammer, I think I hit a bug with more recent than
legacy tunables.
Being in legacy tunables for a while, I decided to experiment with
"better" tunables. So first I went from argonaut profile to bobtail
and then to firefly. However, I decided to make the changes on
chooseleaf_vary_r incrementally (because the remapping from 0 to 5 was
huge), from 5 down to the best value (1). So when I reached
chooseleaf_vary_r = 2, I decided to run a simple test before going to
chooseleaf_vary_r = 1: close an OSD (OSD.14) and let the cluster
recover. But the recovery never completes and a PG remains stuck,
reported as undersized+degraded. No OSD is near full and all pools
have min_size=1.
ceph osd crush show-tunables -f json-pretty
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 2,
"straw_calc_version": 1,
"allowed_bucket_algs": 22,
"profile": "unknown",
"optimal_tunables": 0,
"legacy_tunables": 0,
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"require_feature_tunables3": 1,
"has_v2_rules": 0,
"has_v3_rules": 0,
"has_v4_buckets": 0
}
The really strange thing is that the OSDs of the stuck PG belong to
other nodes than the one I decided to stop (osd.14).
# ceph pg dump_stuck
ok
pg_stat state up up_primary acting acting_primary
179.38 active+undersized+degraded [2,8] 2 [2,8] 2
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 11.19995 root default
-3 11.19995 rack unknownrack
-2 0.39999 host staging-rd0-03
14 0.20000 osd.14 up 1.00000 1.00000
15 0.20000 osd.15 up 1.00000 1.00000
-8 5.19998 host staging-rd0-01
6 0.59999 osd.6 up 1.00000 1.00000
7 0.59999 osd.7 up 1.00000 1.00000
8 1.00000 osd.8 up 1.00000 1.00000
9 1.00000 osd.9 up 1.00000 1.00000
10 1.00000 osd.10 up 1.00000 1.00000
11 1.00000 osd.11 up 1.00000 1.00000
-7 5.19998 host staging-rd0-00
0 0.59999 osd.0 up 1.00000 1.00000
1 0.59999 osd.1 up 1.00000 1.00000
2 1.00000 osd.2 up 1.00000 1.00000
3 1.00000 osd.3 up 1.00000 1.00000
4 1.00000 osd.4 up 1.00000 1.00000
5 1.00000 osd.5 up 1.00000 1.00000
-4 0.39999 host staging-rd0-02
12 0.20000 osd.12 up 1.00000 1.00000
13 0.20000 osd.13 up 1.00000 1.00000
Have you experienced something similar?
Regards,
Kostis
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com