On Wed, May 3, 2023 at 11:24 AM Kamil Madac <kamil.ma...@gmail.com> wrote:
>
> Hi,
>
> We deployed pacific cluster 16.2.12 with cephadm. We experience following
> error during rbd map:
>
> [Wed May  3 08:59:11 2023] libceph: mon2 (1)[2a00:da8:ffef:1433::]:6789
> session established
> [Wed May  3 08:59:11 2023] libceph: another match of type 1 in addrvec
> [Wed May  3 08:59:11 2023] libceph: corrupt full osdmap (-22) epoch 200 off
> 1042 (000000009876284d of 000000000cb24b58-0000000080b70596)
> [Wed May  3 08:59:11 2023] osdmap: 00000000: 08 07 7d 10 00 00 09 01 5d 09
> 00 00 a2 22 3b 86  ..}.....]....";.
> [Wed May  3 08:59:11 2023] osdmap: 00000010: e4 f5 11 ed 99 ee 47 75 ca 3c
> ad 23 c8 00 00 00  ......Gu.<.#....
> [Wed May  3 08:59:11 2023] osdmap: 00000020: 21 68 4a 64 98 d2 5d 2e 84 fd
> 50 64 d9 3a 48 26  !hJd..]...Pd.:H&
> [Wed May  3 08:59:11 2023] osdmap: 00000030: 02 00 00 00 01 00 00 00 00 00
> 00 00 1d 05 71 01  ..............q.
> ....
>
> Linux Kernel is 6.1.13 and the important thing is that we are using ipv6
> addresses for connection to ceph nodes.
> We were able to map rbd from client with kernel 5.10, but in prod
> environment we are not allowed to use that kernel.
>
> What could be the reason for such behavior on newer kernels and how to
> troubleshoot it?
>
> Here is output of ceph osd dump:
>
> # ceph osd dump
> epoch 200
> fsid a2223b86-e4f5-11ed-99ee-4775ca3cad23
> created 2023-04-27T12:18:41.777900+0000
> modified 2023-05-02T12:09:40.642267+0000
> flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
> crush_version 34
> full_ratio 0.95
> backfillfull_ratio 0.9
> nearfull_ratio 0.85
> require_min_compat_client luminous
> min_compat_client jewel
> require_osd_release pacific
> stretch_mode_enabled false
> pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 183
> flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application
> mgr_devicehealth
> pool 2 'idp' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins
> pg_num 32 pgp_num 32 autoscale_mode on last_change 48 flags
> hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> max_osd 3
> osd.0 up   in  weight 1 up_from 176 up_thru 182 down_at 172
> last_clean_interval [170,171)
> [v2:[2a00:da8:ffef:1431::]:6800/805023868,v1:[2a00:da8:ffef:1431::]:6801/805023868,v2:
> 0.0.0.0:6802/805023868,v1:0.0.0.0:6803/805023868]
> [v2:[2a00:da8:ffef:1431::]:6804/805023868,v1:[2a00:da8:ffef:1431::]:6805/805023868,v2:
> 0.0.0.0:6806/805023868,v1:0.0.0.0:6807/805023868] exists,up
> e8fd0ee2-ea63-4d02-8f36-219d36869078
> osd.1 up   in  weight 1 up_from 136 up_thru 182 down_at 0
> last_clean_interval [0,0)
> [v2:[2a00:da8:ffef:1432::]:6800/2172723816,v1:[2a00:da8:ffef:1432::]:6801/2172723816,v2:
> 0.0.0.0:6802/2172723816,v1:0.0.0.0:6803/2172723816]
> [v2:[2a00:da8:ffef:1432::]:6804/2172723816,v1:[2a00:da8:ffef:1432::]:6805/2172723816,v2:
> 0.0.0.0:6806/2172723816,v1:0.0.0.0:6807/2172723816] exists,up
> 0b7b5628-9273-4757-85fb-9c16e8441895
> osd.2 up   in  weight 1 up_from 182 up_thru 182 down_at 178
> last_clean_interval [123,177)
> [v2:[2a00:da8:ffef:1433::]:6800/887631330,v1:[2a00:da8:ffef:1433::]:6801/887631330,v2:
> 0.0.0.0:6802/887631330,v1:0.0.0.0:6803/887631330]
> [v2:[2a00:da8:ffef:1433::]:6804/887631330,v1:[2a00:da8:ffef:1433::]:6805/887631330,v2:
> 0.0.0.0:6806/887631330,v1:0.0.0.0:6807/887631330] exists,up
> 21f8d0d5-6a3f-4f78-96c8-8ec4e4f78a01

Hi Kamil,

The issue is bogus 0.0.0.0 addresses.  This came up before, see [1] and
later messages from Stefan in the thread.  You would need to ensure that
ms_bind_ipv4 is set to false and restart OSDs.

[1] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/Q6VYRJBPHQI63OQTBJG2N3BJD2KBEZM4/

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to