Op ma 15 jun 2026 om 22:14 schreef Stuart Henderson <[email protected]>:


Thanks for your suggestions (including the unrelated one about not forcing
media). Based on your information I was able to dig a bit deeper and I now
think I know exactly what's going on and how to reliably work around it.
I'm not sure about what a proper fix would be, or if there even is anything
to fix on OpenBSD's end.

At its core, the problem seems to be that after terminating the PPPoE session,
that session lingers long enough at the ISPs end to cause the next pppoe0
session to not come up correctly with respect to IPv4.

When doing "ifconfig pppoe0 down", a packet trace shows that pppoe(4)
correctly causes the link to go down nicely, with the ISP cooperating as well:

1   0.000000    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPP LCP 26
 Termination Request
2   0.005625    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPPoED  56
 Active Discovery Terminate (PADT)
3   0.005924    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP LCP 56
 Termination Ack
4   0.028349    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPPoED  20
 Active Discovery Terminate (PADT)

But pppoe0 almost immediately brings the connection back up, and it appears
the ISP is not ready for that yet. It accepts the new connection but
terminates it almost immediately:

5   0.028359    Shuttle_fd:87:a0    Broadcast   PPPoED  38  Active
Discovery Initiation (PADI)
6   0.034543    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPPoED  90
 Active Discovery Offer (PADO) AC-Name='bras0.fi001.nl.freedomnet.nl'
7   0.034561    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPPoED  58
 Active Discovery Request (PADR)
8   0.040679    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPPoED  90
 Active Discovery Session-confirmation (PADS)
AC-Name='bras0.fi001.nl.freedomnet.nl'
9   0.040699    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPP LCP 32
 Configuration Request
10  0.046332    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP LCP 56
 Configuration Request
11  0.046334    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP LCP 56
 Configuration Ack
12  0.046355    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPP LCP 36
 Configuration Ack
13  0.046678    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPP PAP 47
 Authenticate-Request (Peer-ID='[email protected]', Password='1234')
14  0.154346    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP PAP 56
 Authenticate-Ack (Message='')
15  0.154371    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPP IPCP
 44  Configuration Request
16  0.154375    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPP IPV6CP
 36  Configuration Request
17  0.160092    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP IPCP
 56  Configuration Request
18  0.160093    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP IPCP
 56  Configuration Nak
19  0.160111    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPP IPCP
 32  Configuration Ack
20  0.160120    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPP IPCP
 44  Configuration Request
21  0.160223    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP IPV6CP
 56  Configuration Request
22  0.160240    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPP IPV6CP
 36  Configuration Ack
23  0.260422    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP LCP 56
 Termination Request
24  0.260445    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPP LCP 26
 Termination Ack

I tried the same thing with "link1" enabled on pppoe0. The difference was
that pppoe0 sent the PADI not immediately, but after half a second. This
delay was enough for the new session to come up correctly. I don't think it's
a good idea to run the connection with link1 enabled all the time though,
and it doesn't solve an unclean reboot after a crash or power issue. Also I
don't know what the ISP's timeout is after cleanly bringing down a session.

When doing "ifconfig pppoe0 destroy ; sleep $VALUE ; sh /etc/netstart pppoe0",
for values of 30 and lower the new connection does *not* come up correctly,
while for values of 40 and higher it comes up correctly. A packet trace
cleary shows what's going on:

1   0.000000    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP LCP 56
 Echo Request
2   0.000018    Shuttle_fd:87:a0    JuniperNetwo_73:40:27   PPP LCP 30
 Echo Reply
3   10.007040   JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP LCP 56
 Echo Request
4   10.839780   Shuttle_fd:87:a0    Broadcast   PPPoED  38  Active
Discovery Initiation (PADI)

The session wasn't brought down nicely so the ISP keeps sending PPP LCP
Echo Requests, but before the ISP times out the connection pppoe0 brings up
a new session and the problem occurs again. Note that this scenario seems
to be the exact scenario what "option PPPOE_TERM_UNKNOWN_SESSIONS" is for (as
documented in pppoe(4)'s manpage), but given the problem also occurs
when the connection is brought down nicely but a new one brought up "too soon",
I'm not sure that this would be a reliable solution. Plus I don't really
want to run a customized kernel.

For $VALUE greater than about 40, this happens:

1   0.000000    JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP LCP 56
 Echo Request
2   10.007102   JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP LCP 56
 Echo Request
3   20.014166   JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP LCP 56
 Echo Request
4   30.021195   JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP LCP 56
 Echo Request
5   30.178357   JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPPoED  56
 Active Discovery Terminate (PADT)
6   30.178612   JuniperNetwo_73:40:27   Shuttle_fd:87:a0    PPP LCP 56
 Termination Request
7   33.251946   Shuttle_fd:87:a0    Broadcast   PPPoED  38  Active
Discovery Initiation (PADI)

ISP times out the session (PADT), and three seconds later pppoe0 brings up a
new session, which succeeds.


When rebooting, the connection does not appear to be brought down nicely
(I wasn't able to verify that using packet captures), and as the machine
boots fairly quickly it tries to bring up a new session while the ISP is still
in its PPP LCP Echo Request timeout phase - and this causes the problem.


The workaround I've now implemented is a crude one, but it seems to be
reliable given my findings: it's simply adding a "!/bin/sleep 45" to the top
of /etc/hostname.pppoe0.

It also solves *all* scenarios, even those where the session is not brought
down correctly because of the host crashing or a short power failure. Adding
45 seconds to the boot is not a big problem for me, so I'm happy with this
workaround, but of course I'm open to better suggestions. :)


> On 2026-06-15, Stuart Henderson <[email protected]> wrote:
> > On 2026-06-15, Jurjen Oskam <[email protected]> wrote:
> >> Hi,
> >>
> >> I recently switched to another ISP, and it uses PPPoE so I set things
> >> up using pppoe(4). This works fine about 50% of the time: after a
> >> reboot of the OpenBSD box there's a ~50% chance that the link comes up
> >> correctly. The strange thing is that IPv4 connectivity does *not* work
> >> in that case, while IPv6 connectivity (via dhcp6leased) *does* work. A
> >> tcpdump of IPv4 traffic on pppoe0 shows only outbound packets (TCP
> >> SYNs, UDP, etc), nothing coming back in.
> >>
> >> The PPPoE connection is from an Ethernet interface on my machine
> >> (igc1) directly to the ONT of the ISP. The ONT expects the PPPoE
> >> session on VLAN 6. The ISP has assigned a static IPv4 address.
> >>
> >> I suspect some sort of race condition occurring somewhere, but I
> >> wouldn't know where to start digging. What would be the best way of
> >> debugging this?
> >
> > pppoe starts trying to negotiate when the interface comes up
> > with whatever address families it knows about. IIRC changing the
> > available address families mid negotation can sometimes have issues.
> >
> > there's no proper way round it becausing with most interface types
> > (including pppoe), configuring an address or setting 'autoconf' (for
> > either v4 or v6) automatically brings the interface up at that point,
> > rather than when you explicitly use "up".
> >
> > some years ago dlg had some work to fix this in general (with the
> > compatibility case handled by netstart running "ifconfig up" in the
> > same situations that it would have been brought up anyway, but at
> > the end of configuring an interface rather than partway through)
> > but IIRC there were objections.
> >
> >> The hostname.if files:
> >>
> >> calvin# cat /etc/hostname.igc1
> >> media 2500baseT mtu 1508 up
> >
> > unrelated but I don't recommend forcing media (which is also liable to
> > disable auto duplex, which could result in one side using half-duplex
> > and the other full-duplex, which is a pig to debug). autodetect is
> > generally the sane way.
> >
> >> calvin# cat /etc/hostname.vlan6
> >> vnetid 6 parent igc1 mtu 1508 up
> >> calvin# cat /etc/hostname.pppoe0
> >> inet 45.142.146.140 255.255.255.255 NONE \
> >>         pppoedev vlan6 authproto pap \
> >>         authname '[email protected]' authkey '1234' \
> >>         mtu 1500 up
> >> dest 0.0.0.1
> >> inet6 autoconf eui64
> >> !/sbin/route add default -ifp pppoe0 0.0.0.1
> >> !/sbin/route add -inet6 default -ifp pppoe0 fe80::%pppoe0
> >
> > try to do as little as possible in hostname.if between configuring the
> > v4 and v6 addresses. this does _not_ avoid the race but might mean that
> > you win it more often (maybe even consistently). this should be
> > equivalent to the file you have above:
> >
> > pppoedev vlan6 authproto pap authname '[email protected]' authkey '1234' mtu 
> > 1500
> > inet 45.142.146.140 255.255.255.255 0.0.0.1
> > inet6 autoconf
> > !/sbin/route add default -ifp pppoe0 0.0.0.1
> > !/sbin/route add -inet6 default -ifp pppoe0 fe80::%pppoe0
>
> p.s. there is also a race here where the route is added; the route with
> the 0.0.0.1 placeholder must be added _before_ the connection comes up,
> otherwise the remote side ip as seen in ifconfig will change from 0.0.0.1
> to the remote's real address, at which point you can no longer add the
> 0.0.0.1 route.
>
> --
> Please keep replies on the mailing list.
>

Reply via email to