Re: [ceph-users] PG mapped to OSDs on same host although 'chooseleaf type host'

Gregory Farnum Thu, 22 Feb 2018 15:39:52 -0800

On Thu, Feb 22, 2018 at 9:29 AM Wido den Hollander <w...@42on.com> wrote:


> Hi,
>
> I have a situation with a cluster which was recently upgraded to
> Luminous and has a PG mapped to OSDs on the same host.
>
> root@man:~# ceph pg map 1.41
> osdmap e21543 pg 1.41 (1.41) -> up [15,7,4] acting [15,7,4]
> root@man:~#
>
> root@man:~# ceph osd find 15|jq -r '.crush_location.host'
> n02
> root@man:~# ceph osd find 7|jq -r '.crush_location.host'
> n01
> root@man:~# ceph osd find 4|jq -r '.crush_location.host'
> n02
> root@man:~#
>
> As you can see, OSD 15 and 4 are both on the host 'n02'.
>
> This PG went inactive when the machine hosting both OSDs went down for
> maintenance.
>
> My first suspect was the CRUSHMap and the rules, but those are fine:
>
> rule replicated_ruleset {
>         id 0
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type host
>         step emit
> }
>
> This is the only rule in the CRUSHMap.
>
> ID CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
> -1       19.50325 root default
> -2        2.78618     host n01
>   5   ssd  0.92999         osd.5      up  1.00000 1.00000
>   7   ssd  0.92619         osd.7      up  1.00000 1.00000
> 14   ssd  0.92999         osd.14     up  1.00000 1.00000
> -3        2.78618     host n02
>   4   ssd  0.92999         osd.4      up  1.00000 1.00000
>   8   ssd  0.92619         osd.8      up  1.00000 1.00000
> 15   ssd  0.92999         osd.15     up  1.00000 1.00000
> -4        2.78618     host n03
>   3   ssd  0.92999         osd.3      up  0.94577 1.00000
>   9   ssd  0.92619         osd.9      up  0.82001 1.00000
> 16   ssd  0.92999         osd.16     up  0.84885 1.00000
> -5        2.78618     host n04
>   2   ssd  0.92999         osd.2      up  0.93501 1.00000
> 10   ssd  0.92619         osd.10     up  0.76031 1.00000
> 17   ssd  0.92999         osd.17     up  0.82883 1.00000
> -6        2.78618     host n05
>   6   ssd  0.92999         osd.6      up  0.84470 1.00000
> 11   ssd  0.92619         osd.11     up  0.80530 1.00000
> 18   ssd  0.92999         osd.18     up  0.86501 1.00000
> -7        2.78618     host n06
>   1   ssd  0.92999         osd.1      up  0.88353 1.00000
> 12   ssd  0.92619         osd.12     up  0.79602 1.00000
> 19   ssd  0.92999         osd.19     up  0.83171 1.00000
> -8        2.78618     host n07
>   0   ssd  0.92999         osd.0      up  1.00000 1.00000
> 13   ssd  0.92619         osd.13     up  0.86043 1.00000
> 20   ssd  0.92999         osd.20     up  0.77153 1.00000
>
> Here you see osd.15 and osd.4 on the same host 'n02'.
>
> This cluster was upgraded from Hammer to Jewel and now Luminous and it
> doesn't have the latest tunables yet, but should that matter? I never
> encountered this before.
>
> tunable choose_local_tries 0
> tunable choose_local_fallback_tries 0
> tunable choose_total_tries 50
> tunable chooseleaf_descend_once 1
> tunable chooseleaf_vary_r 1
> tunable chooseleaf_stable 1
> tunable straw_calc_version 1
> tunable allowed_bucket_algs 54
>
> I don't want to touch this yet in the case this is a bug or glitch in
> the matrix somewhere.
>
> I hope it's just a admin mistake, but so far I'm not able to find a clue
> pointing to that.
>
> root@man:~# ceph osd dump|head -n 12
> epoch 21545
> fsid 0b6fb388-6233-4eeb-a55c-476ed12bdf0a
> created 2015-04-28 14:43:53.950159
> modified 2018-02-22 17:56:42.497849
> flags sortbitwise,recovery_deletes,purged_snapdirs
> crush_version 22
> full_ratio 0.95
> backfillfull_ratio 0.9
> nearfull_ratio 0.85
> require_min_compat_client luminous
> min_compat_client luminous
> require_osd_release luminous
> root@man:~#
>
> I also downloaded the CRUSHmap and ran crushtool with --test and
> --show-mappings, but that didn't show any PG mapped to the same host.
>

What *was* the mapping for the PG in question, then?

At a first guess, it sounds to me like CRUSH is failing to map the
appropriate number of participants on this PG, so one of the extant OSDs
from a prior epoch is getting drafted. I would expect this to show up as a
remapped PG.
-Greg


>
> Any ideas on what might be going on here?
>
> Wido
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG mapped to OSDs on same host although 'chooseleaf type host'

Reply via email to