I'm currently on vacation and can't look into this soon.

One thing that comes to mind: do these machines keep proper time or are they 
having issues with timer interrupts stopping because of too new KVM version and 
missing hypervisor flag (someone with access to a real computer please  chip in 
with a link to a thread where this has been discussed before and the name of 
the KVM flag).

I fixed a bug in this area in the summer and you would observe this kind of 
behavior if timers are not running correctly.

Thanks,
Florian

On October 23, 2018 6:18:19 PM GMT+02:00, "Aaron A. Glenn" <aag@bsd.network> 
wrote:
>Hello,
>
>AS57335 operates a 49 node anycast instance exclusively running
>OpenBSD. All
>instances are hosted virtual machines (aka "VPS" instances) and all are
>running a recent snapshot (kern.version=OpenBSD 6.4-current
>(GENERIC.MP) #381:
>Mon Oct 22 22:18:48 MDT 2018). Eleven of these nodes exhibit strange
>ndp(8)
>behavior causing IPv6 BGP sessions to flap at inconsistent intervals.
>
>All eleven instances have the following in common:
>
>       vio(4) network interface
>       netmask of /64
>       do not use autoconf
>       Linux KVM hypervisor hosts (hw.vendor=QEMU & pvbus0 at mainbus0: KVM)
>       kern.timecounter.hardware=acpihpet0
>       v6 gateway is Cisco (based on OUI lookup)
>       no pf(4) rules
>
>BGP session traffic is the only regular/recurring v6 traffic on the
>nodes.
>Running a `ping6 google.com` in the background will occasionally allow
>BGP
>sessions to stay alive for 6-12 hours (in some cases, one to two days).
>
>From looking at `ndp -nA 1` output, the gateway address state will
>change to
>Delay with an expiry of ~45 seconds then set to Stale with an expiry of
>24h.
>When set Stale with 24h expiry, a link-local address with the gateway
>linklayer in it (ex. fe80::e25f:b9ff:fed1:527f%vio0) is added with a
>state of
>Delay and an expiry of 5 seconds. Once expiry reaches 1 second
>remaining, the
>link-local entry begins three attempts at Probe, and at the first
>attempt the
>gateway address expiry goes from 23h59m55s to 5s. After three Probe
>attempts,
>the link-local entry is removed, and the gateway address expiry goes to
>45s or
>sometimes a bit less (38s is the lowest I've caught).
>
>I admit of all the RFCs I've read, NDP is not any of them; nor have I
>gone
>spelunking in the code base at all (I peeked once; and would need a
>buddy to
>have a useful look again).
>
>I am happy to add a pubkey to any and all systems exhibiting this
>behavior;
>and of course provide any additional detail that might be useful.
>
>Thanks
>
>(please cc me as I am not subscribed to this list)

Reply via email to