On Wed, Jul 18, 2018 at 3:20 AM Anthony D'Atri <a...@dreamsnake.net> wrote: > > The documentation here: > > http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/ > > says > > "Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6 > seconds" > > and > > " If a neighboring Ceph OSD Daemon doesn’t show a heartbeat within a 20 > second grace period, the Ceph OSD Daemon may consider the neighboring Ceph > OSD Daemon down and report it back to a Ceph Monitor," > > I've always thought that each OSD heartbeats with *every* other OSD, which of > course means that total heartbeat traffic grows ~ quadratically. However in > extending test we've observed that the number of other OSDs that a subject > heartbeat (heartbeated?) was < N, which has us wondering if perhaps only OSDs > with which a given OSD shares are contacted -- or some other subset. >
OSDs heartbeat with their peers, the set of osds with whom they share at least one PG. You can see the heartbeat peers (HB_PEERS) in ceph pg dump -- after the header "OSD_STAT USED AVAIL TOTAL HB_PEERS..." This is one of the nice features of the placement group concept -- heartbeats and peering in general stays constant with the number of PGs per OSD, rather than scaling up with the total number of OSDs in a cluster. Cheers, Dan > I plan to submit a doc fix for mon_osd_min_down_reporters and wanted to > resolve this FUD first. > > -- aad > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com