On 2026-01-13, Alisdair MacLeod <[email protected]> wrote: > I've been trying to resolve this issue for a few months now and finally > have reached the point where I thought reaching out on here might yield > some more useful insights. > > I recently added IPv6 configuration to my OpenBSD router and since then > I get an LCP timeout every 2-3 days from which it never recovers. > Looking at the debug output from pppoe it attempts to reconnect but when > sending the PADI to reinitialise the connection there is an ENOBUFS > error. This seems to lock the whole system up, I can't even log in > locally it just hangs after receiving the username and password, and > after a reboot everything reconnects and starts working immediately. > > Could the addition of IPv6 and related PF rules have caused it to start > to hit max clusters? I don't see any other related issues in messages > around hitting that limit which I would have expected to see. > Everything was working without any issues until the addition of the IPv6 > connection. > > I've tried to include as much information here as I have but please let > me know if more would be useful and I'll grab it. > > Really I just want to know if bumping kern.maxclusters is the correct > solution here, and if so is there some guide or handy rule of thumb as > to by how much?
Probably not the correct solution, this smells like a bug (mbuf leak). Bumping the max limit would keep it running for longer when it hits the problem but depending on how fast the leak is, it might not be much longer. > kern.maxclusters=262144 > > # netstat -m > 3388 mbufs in use: > 3242 mbufs allocated to data > 88 mbufs allocated to packet headers > 58 mbufs allocated to socket names and addresses > 2471/2552 mbuf 2048 byte clusters in use (current/peak) > 349/510 mbuf 2112 byte clusters in use (current/peak) I'd monitor this over time and see if the rise is sudden or gradual. And whether you can correlate with log entries. Is it happening after the LCP timeout? Is the LCP timeout happening after this has already risen? e.g. $ while true; do netstat -m | grep mbuf.2048; sleep 5; done | ts Also: what interface types do you have on the system? (ifconfig | grep ^[a-z] - I know you included dmesg but that doesn't show anything like wg, gif, etc if you're using those). Do you use ipsec? If you manage to catch it while the connection is down but before the machine has locked up, does it help to do 'ifconfig igc0 down; ifconfig igc0 up'? Capture of the output of 'systat mbuf' might also give a clue (it updates frequently, you could leave it running in ssh).

