Hi,
here is the output:
ceph osd crush show-tunables
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 100,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 22,
"profile": "unknown",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "hammer",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 1,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}
[root@ceph-mon-01 ~]#
--
Regards
Lukasz
> Hi,
> This looks like a tunables issue.
> What is the output of `ceph osd crush show-tunables `
> -- Dan
> On Fri, Jun 14, 2019 at 11:19 AM Luk <[email protected]> wrote:
>>
>> Hello,
>>
>> Maybe somone was fighting with this kind of stuck in ceph already.
>> This is production cluster, can't/don't want to make wrong steps,
>> please advice, what to do.
>>
>> After changing of one failed disk (it was osd-7) on our cluster ceph
>> didn't recover to HEALTH_OK, it stopped in state:
>>
>> [root@ceph-mon-01 ~]# ceph -s
>> cluster:
>> id: b6f23cff-7279-f4b0-ff91-21fadac95bb5
>> health: HEALTH_WARN
>> noout,noscrub,nodeep-scrub flag(s) set
>> Degraded data redundancy: 24761/45994899 objects degraded
>> (0.054%), 8 pgs degraded, 8 pgs undersized
>>
>> services:
>> mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03
>> mgr: ceph-mon-03(active), standbys: ceph-mon-02, ceph-mon-01
>> osd: 144 osds: 144 up, 144 in
>> flags noout,noscrub,nodeep-scrub
>> rbd-mirror: 3 daemons active
>> rgw: 6 daemons active
>>
>> data:
>> pools: 18 pools, 2176 pgs
>> objects: 15.33M objects, 49.3TiB
>> usage: 151TiB used, 252TiB / 403TiB avail
>> pgs: 24761/45994899 objects degraded (0.054%)
>> 2168 active+clean
>> 8 active+undersized+degraded
>>
>> io:
>> client: 435MiB/s rd, 415MiB/s wr, 7.94kop/s rd, 2.96kop/s wr
>>
>> Restart of OSD didn't helped, changing choose_total_tries from 50 to 100
>> didn't help.
>>
>> I checked one of degraded PG, 10.3c4
>>
>> [root@ceph-mon-01 ~]# ceph pg dump 2>&1 | grep -w 10.3c4
>> 10.3c4 3593 0 3593 0 0 14769891858
>> 10076 10076 active+undersized+degraded 2019-06-13 08:19:39.802219
>> 37380'71900564 37380:119411139 [9,109] 9 [9,109]
>> 9 33550'69130424 2019-06-08 02:28:40.508790 33550'69130424
>> 2019-06-08 02:28:40.508790 18
>>
>>
>> [root@ceph-mon-01 ~]# ceph pg 10.3c4 query | jq '.["peer_info"][] | {peer:
>> .peer, last_update:.last_update}'
>> {
>> "peer": "0",
>> "last_update": "36847'71412720"
>> }
>> {
>> "peer": "109",
>> "last_update": "37380'71900570"
>> }
>> {
>> "peer": "117",
>> "last_update": "0'0"
>> }
>>
>>
>> [root@ceph-mon-01 ~]#
>> I have checked space taken for this PG on storage nodes:
>> here is how to check where is particular OSD (on which physical storage
>> node):
>> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 9 "
>> | 9 | stor-a02 | 2063G | 5386G | 52 | 1347k | 53 | 292k |
>> exists,up |
>> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 109 "
>> | 109 | stor-a01 | 1285G | 4301G | 5 | 31.0k | 6 | 59.2k |
>> exists,up |
>> [root@ceph-mon-01 ~]# watch ceph -s
>> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 117 "
>> | 117 | stor-b02 | 1334G | 4252G | 54 | 1216k | 13 | 27.4k |
>> exists,up |
>> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 0 "
>> | 0 | stor-a01 | 2156G | 5293G | 58 | 387k | 29 | 30.7k |
>> exists,up |
>> [root@ceph-mon-01 ~]#
>> and checking sizes on servers:
>> stor-a01 (this PG shouldn't be on the same host):
>> [root@stor-a01 /var/lib/ceph/osd/ceph-0/current]# du -sh 10.3c4_*
>> 2.4G 10.3c4_head
>> 0 10.3c4_TEMP
>> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]# du -sh 10.3c4_*
>> 14G 10.3c4_head
>> 0 10.3c4_TEMP
>> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]#
>> stor-a02:
>> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]# du -sh 10.3c4_*
>> 14G 10.3c4_head
>> 0 10.3c4_TEMP
>> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]#
>> stor-b02:
>> [root@stor-b02 /var/lib/ceph/osd/ceph-117/current]# du -sh 10.3c4_*
>> zsh: no matches found: 10.3c4_*
>>
>> information about ceph:
>> [root@ceph-mon-01 ~]# ceph versions
>> {
>> "mon": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
>> luminous (stable)": 3
>> },
>> "mgr": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
>> luminous (stable)": 3
>> },
>> "osd": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
>> luminous (stable)": 144
>> },
>> "mds": {},
>> "rbd-mirror": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
>> luminous (stable)": 3
>> },
>> "rgw": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
>> luminous (stable)": 6
>> },
>> "overall": {
>> "ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
>> luminous (stable)": 159
>> }
>> }
>>
>> crushmap: https://pastebin.com/cpC2WmyS
>> ceph osd tree: https://pastebin.com/XvZ2cNZZ
>>
>> I'm cross-posting this do devel because maybe there is some known bug
>> in this particular version of ceph, and You could point some
>> directions to fix this problem.
>>
>> --
>> Regards
>> Lukasz
>>
--
Pozdrowienia,
Luk
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com