Staffan Thomén <[email protected]> writes: > I recently got a PCEngines APU2 (not sure of the exact model) to > replace my failing Soekris gateway
As Joseph taught Eliza to say, many others have the same sorts of feelings. Except that my net5501 was fine, just slow, and I got an apu2d4. (As an side, pcengines makes really nice hardware and when I asked them questions about input voltage/current because I want to run an apu from solar/battery, I got actual answers to my technical questions immediately from someone who really knew that they were doing.) > and some strange behaviour appeared after I took it in production. > After the system has been running for a few hours, it seems to stop > being able to send packets on the internal wired network interface > (possibly also the external, I can't tell) on a per-process basis, and > seems to mostly affect IPv4. ICMP and UDP seems more prone to failure > than TCP retransmission?). This seems really unlikely to be a hardware issue... > For instance, if I ping a host on my network from the gateway, only a > few icmp requests go out (checked with tcpdump), sometimes one, > sometimes ten but then it just sits there. The process seems to be > stuck in select, if top is to believed. > > Attaching a debugger yields; > > (gdb) bt > #0 0x000070e3f803e28a in poll () from /lib/libc.so.12 > #1 0x000000002f003a6f in main () > > Once I quit the debugger, sometimes a few packets get sent (and received) > again. > > Pressing ctrl-c stops the ping process properly, and it says it sent > and received 8/8 packets or whatever. So the issue is ping getting packets into the stack, not the interface, and none are lost. What happens if you ping the apu from a host on the lan? > Disabling pf did nothing. > > Packet forwarding seems to work just fine. > > I also have a small daemon that I wrote that listens to pflog devices > that decodes the log and sends the messages to syslog. These also seem > to stop in the same maner as ping, but in read() in pcap_loop(). > > Once the system is in this state, it can't reboot itself either, > presumably waiting something somewhere. Do you mean "typing shutdown hangs" or also "typing reboot hangs". > The apu2 is flashed with the latest firmware available, and that made > no difference. > > Since this is a new system, I don't know if it's faulty or if netbsd > is doing the strange stuff. When you say "disabling pf", do you mean completely removing all pf config and freshly booting? > Advice? I will probably try to roll back my sources to this summer > sometime and see if an older kernel works, the kernel that was > optimized for my NET6501 appeared to not have the same problem, but I > am not sure. I am running netbsd-8 amd64 on mine, updating every month or so. I have seen no issues like you describe. But surely there are lots of things different. (gdb) bt #0 0x000070e3f803e28a in poll () from /lib/libc.so.12 #1 0x000000002f003a6f in main () You might also try "ps alxw" and look at WCHAN. The other advice I always give is netstat -s > BEFORE do stuff netstat -s > AFTER diff -u BEFORE AFTER # understand all counters that changed The point is to notice things you aren't looking for.
