Op ma 15 jun 2026 om 22:14 schreef Stuart Henderson <[email protected]>:
Thanks for your suggestions (including the unrelated one about not forcing media). Based on your information I was able to dig a bit deeper and I now think I know exactly what's going on and how to reliably work around it. I'm not sure about what a proper fix would be, or if there even is anything to fix on OpenBSD's end. At its core, the problem seems to be that after terminating the PPPoE session, that session lingers long enough at the ISPs end to cause the next pppoe0 session to not come up correctly with respect to IPv4. When doing "ifconfig pppoe0 down", a packet trace shows that pppoe(4) correctly causes the link to go down nicely, with the ISP cooperating as well: 1 0.000000 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPP LCP 26 Termination Request 2 0.005625 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPPoED 56 Active Discovery Terminate (PADT) 3 0.005924 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP LCP 56 Termination Ack 4 0.028349 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPPoED 20 Active Discovery Terminate (PADT) But pppoe0 almost immediately brings the connection back up, and it appears the ISP is not ready for that yet. It accepts the new connection but terminates it almost immediately: 5 0.028359 Shuttle_fd:87:a0 Broadcast PPPoED 38 Active Discovery Initiation (PADI) 6 0.034543 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPPoED 90 Active Discovery Offer (PADO) AC-Name='bras0.fi001.nl.freedomnet.nl' 7 0.034561 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPPoED 58 Active Discovery Request (PADR) 8 0.040679 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPPoED 90 Active Discovery Session-confirmation (PADS) AC-Name='bras0.fi001.nl.freedomnet.nl' 9 0.040699 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPP LCP 32 Configuration Request 10 0.046332 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP LCP 56 Configuration Request 11 0.046334 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP LCP 56 Configuration Ack 12 0.046355 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPP LCP 36 Configuration Ack 13 0.046678 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPP PAP 47 Authenticate-Request (Peer-ID='[email protected]', Password='1234') 14 0.154346 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP PAP 56 Authenticate-Ack (Message='') 15 0.154371 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPP IPCP 44 Configuration Request 16 0.154375 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPP IPV6CP 36 Configuration Request 17 0.160092 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP IPCP 56 Configuration Request 18 0.160093 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP IPCP 56 Configuration Nak 19 0.160111 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPP IPCP 32 Configuration Ack 20 0.160120 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPP IPCP 44 Configuration Request 21 0.160223 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP IPV6CP 56 Configuration Request 22 0.160240 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPP IPV6CP 36 Configuration Ack 23 0.260422 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP LCP 56 Termination Request 24 0.260445 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPP LCP 26 Termination Ack I tried the same thing with "link1" enabled on pppoe0. The difference was that pppoe0 sent the PADI not immediately, but after half a second. This delay was enough for the new session to come up correctly. I don't think it's a good idea to run the connection with link1 enabled all the time though, and it doesn't solve an unclean reboot after a crash or power issue. Also I don't know what the ISP's timeout is after cleanly bringing down a session. When doing "ifconfig pppoe0 destroy ; sleep $VALUE ; sh /etc/netstart pppoe0", for values of 30 and lower the new connection does *not* come up correctly, while for values of 40 and higher it comes up correctly. A packet trace cleary shows what's going on: 1 0.000000 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP LCP 56 Echo Request 2 0.000018 Shuttle_fd:87:a0 JuniperNetwo_73:40:27 PPP LCP 30 Echo Reply 3 10.007040 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP LCP 56 Echo Request 4 10.839780 Shuttle_fd:87:a0 Broadcast PPPoED 38 Active Discovery Initiation (PADI) The session wasn't brought down nicely so the ISP keeps sending PPP LCP Echo Requests, but before the ISP times out the connection pppoe0 brings up a new session and the problem occurs again. Note that this scenario seems to be the exact scenario what "option PPPOE_TERM_UNKNOWN_SESSIONS" is for (as documented in pppoe(4)'s manpage), but given the problem also occurs when the connection is brought down nicely but a new one brought up "too soon", I'm not sure that this would be a reliable solution. Plus I don't really want to run a customized kernel. For $VALUE greater than about 40, this happens: 1 0.000000 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP LCP 56 Echo Request 2 10.007102 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP LCP 56 Echo Request 3 20.014166 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP LCP 56 Echo Request 4 30.021195 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP LCP 56 Echo Request 5 30.178357 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPPoED 56 Active Discovery Terminate (PADT) 6 30.178612 JuniperNetwo_73:40:27 Shuttle_fd:87:a0 PPP LCP 56 Termination Request 7 33.251946 Shuttle_fd:87:a0 Broadcast PPPoED 38 Active Discovery Initiation (PADI) ISP times out the session (PADT), and three seconds later pppoe0 brings up a new session, which succeeds. When rebooting, the connection does not appear to be brought down nicely (I wasn't able to verify that using packet captures), and as the machine boots fairly quickly it tries to bring up a new session while the ISP is still in its PPP LCP Echo Request timeout phase - and this causes the problem. The workaround I've now implemented is a crude one, but it seems to be reliable given my findings: it's simply adding a "!/bin/sleep 45" to the top of /etc/hostname.pppoe0. It also solves *all* scenarios, even those where the session is not brought down correctly because of the host crashing or a short power failure. Adding 45 seconds to the boot is not a big problem for me, so I'm happy with this workaround, but of course I'm open to better suggestions. :) > On 2026-06-15, Stuart Henderson <[email protected]> wrote: > > On 2026-06-15, Jurjen Oskam <[email protected]> wrote: > >> Hi, > >> > >> I recently switched to another ISP, and it uses PPPoE so I set things > >> up using pppoe(4). This works fine about 50% of the time: after a > >> reboot of the OpenBSD box there's a ~50% chance that the link comes up > >> correctly. The strange thing is that IPv4 connectivity does *not* work > >> in that case, while IPv6 connectivity (via dhcp6leased) *does* work. A > >> tcpdump of IPv4 traffic on pppoe0 shows only outbound packets (TCP > >> SYNs, UDP, etc), nothing coming back in. > >> > >> The PPPoE connection is from an Ethernet interface on my machine > >> (igc1) directly to the ONT of the ISP. The ONT expects the PPPoE > >> session on VLAN 6. The ISP has assigned a static IPv4 address. > >> > >> I suspect some sort of race condition occurring somewhere, but I > >> wouldn't know where to start digging. What would be the best way of > >> debugging this? > > > > pppoe starts trying to negotiate when the interface comes up > > with whatever address families it knows about. IIRC changing the > > available address families mid negotation can sometimes have issues. > > > > there's no proper way round it becausing with most interface types > > (including pppoe), configuring an address or setting 'autoconf' (for > > either v4 or v6) automatically brings the interface up at that point, > > rather than when you explicitly use "up". > > > > some years ago dlg had some work to fix this in general (with the > > compatibility case handled by netstart running "ifconfig up" in the > > same situations that it would have been brought up anyway, but at > > the end of configuring an interface rather than partway through) > > but IIRC there were objections. > > > >> The hostname.if files: > >> > >> calvin# cat /etc/hostname.igc1 > >> media 2500baseT mtu 1508 up > > > > unrelated but I don't recommend forcing media (which is also liable to > > disable auto duplex, which could result in one side using half-duplex > > and the other full-duplex, which is a pig to debug). autodetect is > > generally the sane way. > > > >> calvin# cat /etc/hostname.vlan6 > >> vnetid 6 parent igc1 mtu 1508 up > >> calvin# cat /etc/hostname.pppoe0 > >> inet 45.142.146.140 255.255.255.255 NONE \ > >> pppoedev vlan6 authproto pap \ > >> authname '[email protected]' authkey '1234' \ > >> mtu 1500 up > >> dest 0.0.0.1 > >> inet6 autoconf eui64 > >> !/sbin/route add default -ifp pppoe0 0.0.0.1 > >> !/sbin/route add -inet6 default -ifp pppoe0 fe80::%pppoe0 > > > > try to do as little as possible in hostname.if between configuring the > > v4 and v6 addresses. this does _not_ avoid the race but might mean that > > you win it more often (maybe even consistently). this should be > > equivalent to the file you have above: > > > > pppoedev vlan6 authproto pap authname '[email protected]' authkey '1234' mtu > > 1500 > > inet 45.142.146.140 255.255.255.255 0.0.0.1 > > inet6 autoconf > > !/sbin/route add default -ifp pppoe0 0.0.0.1 > > !/sbin/route add -inet6 default -ifp pppoe0 fe80::%pppoe0 > > p.s. there is also a race here where the route is added; the route with > the 0.0.0.1 placeholder must be added _before_ the connection comes up, > otherwise the remote side ip as seen in ifconfig will change from 0.0.0.1 > to the remote's real address, at which point you can no longer add the > 0.0.0.1 route. > > -- > Please keep replies on the mailing list. >

