Package: ceph Version: 14.2.9-1~bpo10+1 Dear maintainers,
I run a cluster made of armhf and amd64 OSDs, and amd64 monitors and manager. I recently updated my cluster from Luminous (12, in buster) to Nautilus (14, in buster-backports), following the instructions here: https://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous At some point (and after hot-fixing for #956293 on armhf machines), I noticed something was off, as my OSDs kept flipping between up and down, with all machines of one arch up and the others down. Eventually, the armhf went down definitively down (in the monitors' view). (This might be when I enabled msgr2, but I do not remember the exact timing.) Starting one of the armhf OSDs causes this kind of line to appear in monitors' logs: 2020-05-25 02:07:55.681 7f142df5b700 -1 --2- [v2:[fdfc:0:0:2::e]:3300/0,v1:[fdfc:0:0:2::e]:6789/0] >> conn(0x55f003781a80 0x55f004589b80 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0).run_continuation failed decoding of frame header: buffer::bad_alloc Moving the disk and config from an armhf to an arm64 machine fixes the issue.