On Wed, Jul 18, 2018 at 3:20 AM Anthony D'Atri <a...@dreamsnake.net> wrote:
>
> The documentation here:
>
> http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/
>
> says
>
> "Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6 
> seconds"
>
> and
>
> " If a neighboring Ceph OSD Daemon doesn’t show a heartbeat within a 20 
> second grace period, the Ceph OSD Daemon may consider the neighboring Ceph 
> OSD Daemon down and report it back to a Ceph Monitor,"
>
> I've always thought that each OSD heartbeats with *every* other OSD, which of 
> course means that total heartbeat traffic grows ~ quadratically.  However in 
> extending test we've observed that the number of other OSDs that a subject 
> heartbeat (heartbeated?) was < N, which has us wondering if perhaps only OSDs 
> with which a given OSD shares are contacted -- or some other subset.
>

OSDs heartbeat with their peers, the set of osds with whom they share
at least one PG.
You can see the heartbeat peers (HB_PEERS) in ceph pg dump -- after
the header "OSD_STAT USED  AVAIL TOTAL HB_PEERS..."

This is one of the nice features of the placement group concept --
heartbeats and peering in general stays constant with the number of
PGs per OSD, rather than scaling up with the total number of OSDs in a
cluster.

Cheers, Dan


> I plan to submit a doc fix for mon_osd_min_down_reporters and wanted to 
> resolve this FUD first.
>
> -- aad
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to