On Sat, 24 Jun 2023, Brian Buhrow wrote:
In any case, The fact that you're getting regular delays on your pings suggests there is a delay between the time when the arp cache times out and when it gets refreshed.
This would be determined by `net.inet.arp.nd_delay' I think (on -HEAD).
As a consequence of that delay, if you have a high speed stream running when the cache times out, it's possible the send buffer of the sending process, i.e. sshd, is filling up before that cache gets refreshed and the packets can flow again.
In this case, the kernel would either block the sshd process or return EAGAIN--which is handled. The kernel should only return a EHOSTDOWN if `net.inet.arp.nd_bmaxtries' * `net.inet.arp.nd_retrans' (ie. 3 * 1000ms) has passed without getting an ARP response. Even on a LAN, this is pretty unlikely (even with that peculiarly short 30-second ARP-address cache timeout). Smells like a Xen+load+timing issue (not hand-wavy at all there, RVP!). It would be interesting to see the tcpdump capture from the DomU. -RVP