On Thu, Feb 22, 2018 at 9:29 AM Wido den Hollander <[email protected]> wrote:
> Hi,
>
> I have a situation with a cluster which was recently upgraded to
> Luminous and has a PG mapped to OSDs on the same host.
>
> root@man:~# ceph pg map 1.41
> osdmap e21543 pg 1.41 (1.41) -> up [15,7,4] acting [15,7,4]
> root@man:~#
>
> root@man:~# ceph osd find 15|jq -r '.crush_location.host'
> n02
> root@man:~# ceph osd find 7|jq -r '.crush_location.host'
> n01
> root@man:~# ceph osd find 4|jq -r '.crush_location.host'
> n02
> root@man:~#
>
> As you can see, OSD 15 and 4 are both on the host 'n02'.
>
> This PG went inactive when the machine hosting both OSDs went down for
> maintenance.
>
> My first suspect was the CRUSHMap and the rules, but those are fine:
>
> rule replicated_ruleset {
> id 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
>
> This is the only rule in the CRUSHMap.
>
> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
> -1 19.50325 root default
> -2 2.78618 host n01
> 5 ssd 0.92999 osd.5 up 1.00000 1.00000
> 7 ssd 0.92619 osd.7 up 1.00000 1.00000
> 14 ssd 0.92999 osd.14 up 1.00000 1.00000
> -3 2.78618 host n02
> 4 ssd 0.92999 osd.4 up 1.00000 1.00000
> 8 ssd 0.92619 osd.8 up 1.00000 1.00000
> 15 ssd 0.92999 osd.15 up 1.00000 1.00000
> -4 2.78618 host n03
> 3 ssd 0.92999 osd.3 up 0.94577 1.00000
> 9 ssd 0.92619 osd.9 up 0.82001 1.00000
> 16 ssd 0.92999 osd.16 up 0.84885 1.00000
> -5 2.78618 host n04
> 2 ssd 0.92999 osd.2 up 0.93501 1.00000
> 10 ssd 0.92619 osd.10 up 0.76031 1.00000
> 17 ssd 0.92999 osd.17 up 0.82883 1.00000
> -6 2.78618 host n05
> 6 ssd 0.92999 osd.6 up 0.84470 1.00000
> 11 ssd 0.92619 osd.11 up 0.80530 1.00000
> 18 ssd 0.92999 osd.18 up 0.86501 1.00000
> -7 2.78618 host n06
> 1 ssd 0.92999 osd.1 up 0.88353 1.00000
> 12 ssd 0.92619 osd.12 up 0.79602 1.00000
> 19 ssd 0.92999 osd.19 up 0.83171 1.00000
> -8 2.78618 host n07
> 0 ssd 0.92999 osd.0 up 1.00000 1.00000
> 13 ssd 0.92619 osd.13 up 0.86043 1.00000
> 20 ssd 0.92999 osd.20 up 0.77153 1.00000
>
> Here you see osd.15 and osd.4 on the same host 'n02'.
>
> This cluster was upgraded from Hammer to Jewel and now Luminous and it
> doesn't have the latest tunables yet, but should that matter? I never
> encountered this before.
>
> tunable choose_local_tries 0
> tunable choose_local_fallback_tries 0
> tunable choose_total_tries 50
> tunable chooseleaf_descend_once 1
> tunable chooseleaf_vary_r 1
> tunable chooseleaf_stable 1
> tunable straw_calc_version 1
> tunable allowed_bucket_algs 54
>
> I don't want to touch this yet in the case this is a bug or glitch in
> the matrix somewhere.
>
> I hope it's just a admin mistake, but so far I'm not able to find a clue
> pointing to that.
>
> root@man:~# ceph osd dump|head -n 12
> epoch 21545
> fsid 0b6fb388-6233-4eeb-a55c-476ed12bdf0a
> created 2015-04-28 14:43:53.950159
> modified 2018-02-22 17:56:42.497849
> flags sortbitwise,recovery_deletes,purged_snapdirs
> crush_version 22
> full_ratio 0.95
> backfillfull_ratio 0.9
> nearfull_ratio 0.85
> require_min_compat_client luminous
> min_compat_client luminous
> require_osd_release luminous
> root@man:~#
>
> I also downloaded the CRUSHmap and ran crushtool with --test and
> --show-mappings, but that didn't show any PG mapped to the same host.
>
What *was* the mapping for the PG in question, then?
At a first guess, it sounds to me like CRUSH is failing to map the
appropriate number of participants on this PG, so one of the extant OSDs
from a prior epoch is getting drafted. I would expect this to show up as a
remapped PG.
-Greg
>
> Any ideas on what might be going on here?
>
> Wido
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com