Running 14.2.4 (but same issue observed on 14.2.2) we have a problem with,
thankfully a testing cluster, where all pgs are failing to peer and are
stuck in peering or unknown stale etc states.

My working theory is that this is because the OSDs dont seem to be
utilizing msgr v2 as "ceph osd find osd.NN" only lists the v1 in the
addrvec. This is in contrast to our working 14.2.4 clusters where both v1
and v2 are listed.

Our monitors via `ceph mon dump` show each mon running on v1 and v2 on the
default ports (3300/6789) and I able to reach each of those ports on all
the mons from a few test OSD nodes.

OSD logs are filled with heartbeat_check: no reply from <IP> <OSD.XY> ever
on either front or back

I have attempted to modify the ceph.conf mon_host on the OSDs to use either
the standard comma separate ip list and the new bracketed format and then
restarting OSD daemons on a number of OSDs but it doesnt seem to impact the
addrvec.

My desire is to get the OSDs working on V2 and see if they are able to
begin peering. How can I force the addrvec to update?  Thanks.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to