On Fri, Apr 29, 2022 at 04:42:25PM +0100, Ian Chilton wrote:
> Hi,
>
> Not sure what the etiquette for this list is, so apologies if this is not
> appropriate as it's not a confirmed bug...
>
> I have a whole bunch of subnets which are static routed to a HSRP address,
> provided by a pair of Cisco routers, on a linknet VLAN. Actually, there is
> two VLANs, vlan209 and vlan409. In the case of v6, the HSRP IP is fe80::1, so
> I have routes to fe80::1%vlan209 and fe80::1%vlan409.
>
> This has worked fine for many weeks. On Wednesday evening I upgraded to 7.1.
>
> On Friday morning, I woke up to nearly 2,000 alerts, because some v6 had
> started flapping during the night.
>
> It turns out that fe80::1%vlan409 had randomly become unreachable.
>
> Every few minutes, it would become reachable again for 8 echo replies, then
> goes unreachable again.
>
> This is strange, because we use this same HSRP config / fe80::1 addresses for
> all of our VLANs and have done for years, without issue.
>
> Throughout this, the other OpenBSD host (still on 7.0), can access that
> address with no problem.
>
> Oddly, this host can still access fe80::1%vlan209 no problem.
>
> What seems to happen is, a stale ND entry appears and 8 pings succeed...
> the-gw1# ndp -a |grep vlan409 | grep fe80
> fe80::1%vlan409 00:05:73:a0:00:01 vlan409 23h57m56s S R
> ..
>
> Then this happens:
> the-gw1# ndp -a |grep vlan409 | grep fe80
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> fe80::1%vlan409 (incomplete) vlan409 1s I 2
> Check again, and the entry has disappeared.
>
> A few mins later, the process repeats - 8 pings suddenly succeed and it
> disappears again.
>
> As I say though, fe80::1%vlan209 continues to work fine, as does
> fe80::1%vlan409 from the other host.
>
> fe80::1%vlan209 00:05:73:a0:00:01 vlan209 10s R R
>
> Interestingly, I did see a neighbour entry for fe80::1 on vlan409 on the
> Cisco which is the HSRP master which had a MAC address of the-gw1, which
> implied that the-gw1 is some how responding to ND requests for that IP....
> but I am not able to find those replies in a tcpdump.
>
> As a workaround, i've added another HSRP address, fe80::2 on the Ciscos and
> changed the static routes on this box to use that. After a few hours, that's
> still reachable ok.
>
> It might be total coincidence that this is after a 7.0 -> 7.1 upgrade, but
> thought i'd report it and see if anyone else is seeing any similar issues.
>
> Thanks,
>
> Ian
I had some issues with neighbour discover lately, which started to
appear when I installed a new CPE.
The issue was that the kernel generated outgoing icmp6 messages with a
hop limit, which then got dropped by pf before even reaching the lan.
The workaround was to do
pass proto icmp6 allow-opts
In the meantime, bluhm@ has been working on a proper solution. See
https://marc.info/?l=openbsd-tech&m=165056094900572
-Otto