On Fri, Apr 29, 2022 at 04:42:25PM +0100, Ian Chilton wrote: > Hi, > > Not sure what the etiquette for this list is, so apologies if this is not > appropriate as it's not a confirmed bug... > > I have a whole bunch of subnets which are static routed to a HSRP address, > provided by a pair of Cisco routers, on a linknet VLAN. Actually, there is > two VLANs, vlan209 and vlan409. In the case of v6, the HSRP IP is fe80::1, so > I have routes to fe80::1%vlan209 and fe80::1%vlan409. > > This has worked fine for many weeks. On Wednesday evening I upgraded to 7.1. > > On Friday morning, I woke up to nearly 2,000 alerts, because some v6 had > started flapping during the night. > > It turns out that fe80::1%vlan409 had randomly become unreachable. > > Every few minutes, it would become reachable again for 8 echo replies, then > goes unreachable again. > > This is strange, because we use this same HSRP config / fe80::1 addresses for > all of our VLANs and have done for years, without issue. > > Throughout this, the other OpenBSD host (still on 7.0), can access that > address with no problem. > > Oddly, this host can still access fe80::1%vlan209 no problem. > > What seems to happen is, a stale ND entry appears and 8 pings succeed... > the-gw1# ndp -a |grep vlan409 | grep fe80 > fe80::1%vlan409 00:05:73:a0:00:01 vlan409 23h57m56s S R > .. > > Then this happens: > the-gw1# ndp -a |grep vlan409 | grep fe80 > ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument > ndp: failed to get neighbor information > ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument > ndp: failed to get neighbor information > ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument > ndp: failed to get neighbor information > ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument > ndp: failed to get neighbor information > fe80::1%vlan409 (incomplete) vlan409 1s I 2 > Check again, and the entry has disappeared. > > A few mins later, the process repeats - 8 pings suddenly succeed and it > disappears again. > > As I say though, fe80::1%vlan209 continues to work fine, as does > fe80::1%vlan409 from the other host. > > fe80::1%vlan209 00:05:73:a0:00:01 vlan209 10s R R > > Interestingly, I did see a neighbour entry for fe80::1 on vlan409 on the > Cisco which is the HSRP master which had a MAC address of the-gw1, which > implied that the-gw1 is some how responding to ND requests for that IP.... > but I am not able to find those replies in a tcpdump. > > As a workaround, i've added another HSRP address, fe80::2 on the Ciscos and > changed the static routes on this box to use that. After a few hours, that's > still reachable ok. > > It might be total coincidence that this is after a 7.0 -> 7.1 upgrade, but > thought i'd report it and see if anyone else is seeing any similar issues. > > Thanks, > > Ian
I had some issues with neighbour discover lately, which started to appear when I installed a new CPE. The issue was that the kernel generated outgoing icmp6 messages with a hop limit, which then got dropped by pf before even reaching the lan. The workaround was to do pass proto icmp6 allow-opts In the meantime, bluhm@ has been working on a proper solution. See https://marc.info/?l=openbsd-tech&m=165056094900572 -Otto