On Fri, Apr 29, 2022 at 04:42:25PM +0100, Ian Chilton wrote:

> Hi,
> 
> Not sure what the etiquette for this list is, so apologies if this is not 
> appropriate as it's not a confirmed bug...
> 
> I have a whole bunch of subnets which are static routed to a HSRP address, 
> provided by a pair of Cisco routers, on a linknet VLAN. Actually, there is 
> two VLANs, vlan209 and vlan409. In the case of v6, the HSRP IP is fe80::1, so 
> I have routes to fe80::1%vlan209 and fe80::1%vlan409.
> 
> This has worked fine for many weeks. On Wednesday evening I upgraded to 7.1.
> 
> On Friday morning, I woke up to nearly 2,000 alerts, because some v6 had 
> started flapping during the night.
> 
> It turns out that fe80::1%vlan409 had randomly become unreachable.
> 
> Every few minutes, it would become reachable again for 8 echo replies, then 
> goes unreachable again.
> 
> This is strange, because we use this same HSRP config / fe80::1 addresses for 
> all of our VLANs and have done for years, without issue.
> 
> Throughout this, the other OpenBSD host (still on 7.0), can access that 
> address with no problem.
> 
> Oddly, this host can still access fe80::1%vlan209 no problem.
> 
> What seems to happen is, a stale ND entry appears and 8 pings succeed...
> the-gw1# ndp -a |grep vlan409 | grep fe80
> fe80::1%vlan409                      00:05:73:a0:00:01 vlan409 23h57m56s S R
> ..
> 
> Then this happens:
> the-gw1# ndp -a |grep vlan409 | grep fe80
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> ndp: ioctl(SIOCGNBRINFO_IN6): Invalid argument
> ndp: failed to get neighbor information
> fe80::1%vlan409                      (incomplete)      vlan409 1s        I  2
> Check again, and the entry has disappeared.
> 
> A few mins later, the process repeats - 8 pings suddenly succeed and it 
> disappears again.
> 
> As I say though, fe80::1%vlan209 continues to work fine, as does 
> fe80::1%vlan409 from the other host.
> 
> fe80::1%vlan209                      00:05:73:a0:00:01 vlan209 10s       R R
> 
> Interestingly, I did see a neighbour entry for fe80::1 on vlan409 on the 
> Cisco which is the HSRP master which had a MAC address of the-gw1, which 
> implied that the-gw1 is some how responding to ND requests for that IP.... 
> but I am not able to find those replies in a tcpdump.
> 
> As a workaround, i've added another HSRP address, fe80::2 on the Ciscos and 
> changed the static routes on this box to use that. After a few hours, that's 
> still reachable ok.
> 
> It might be total coincidence that this is after a 7.0 -> 7.1 upgrade, but 
> thought i'd report it and see if anyone else is seeing any similar issues.
> 
> Thanks,
> 
> Ian

I had some issues with neighbour discover lately, which started to
appear when I installed a new CPE.

The issue was that the kernel generated outgoing icmp6 messages with a
hop limit, which then got dropped by pf before even reaching the lan.

The workaround was to do

        pass proto icmp6 allow-opts

In the meantime, bluhm@ has been working on a proper solution. See
https://marc.info/?l=openbsd-tech&m=165056094900572

        -Otto

Reply via email to