On Mon, Sep 07, 2015 at 08:03:51PM +0200, [email protected] wrote: > >Synopsis: Weekly network disconnect with G4 Mac Mini (gem0) > >Category: powerpc > >Environment: > System : OpenBSD 5.7 > Details : OpenBSD 5.7-stable (GENERIC) #2: Wed Aug 12 23:45:47 CEST > 2015 > root@mini:/usr/src/sys/arch/macppc/compile/GENERIC > > Architecture: OpenBSD.macppc > Machine : macppc > >Description: > > Hello, > > I'm experiencing a very strange bug with a headless G4 Mac Mini with > the gem0 network driver. The network disconnects by itself and the machine > loses all internet connectivity. It doesn't respond to pings/ssh even inside > the local network. The rest of the machines in my network seem unaffected so > it's not an issue regarding my router. > > >How-To-Repeat: > > I've narrowed it down to the following conditions: > > - It usually happens about a week of regular usage. My G4 has a fairly > consistent usage pattern so it makes sense that the bug also appears with a > pattern. > Here are some sample dates where the bug was triggered: > - Restart on 12/Aug 04:15, happens again on 19/Aug 15:15 > - Restart on 22/Aug 23:10, happens again on 31/Aug 12:46 > - Restart on 31/Aug 15:10, happens again on 5/Sep 16:11 > > - It once happened after just a couple hours heavily downloading data > (BitTorrent, so it can either be a number of connections issue or an absolute > tx/rx amount issue) > > - It can be fixed with with "ifconfig gem0 down && ifconfig gem0 up", but not > unplugging and replugging the cable. A system restart also solves the issue. > > > There are no error logs. The closest I can get to an error log is the fact > that afpd times out, and I used this timestamp to establish the exact time of > the issue. > > I also run an internet-dependent cron job which starts to fail consistently > with the afpd error message, so I'm confident that the bug trigger time is > correct. > > Here is what I can see on /var/log/messages for the time when the bug is > triggered: > > Aug 22 23:09:57 mini afpd[8461]: afp_alarm: child timed out, entering > disconnected state > Aug 22 23:09:57 mini afpd[8461]: dsi_disconnect: entering disconnected state > Aug 22 23:09:57 mini afpd[8461]: dsi_disconnect: entering disconnected state > > Another one: > > Aug 31 12:46:19 mini afpd[24528]: afp_alarm: child timed out, entering > disconnected state > Aug 31 12:46:19 mini afpd[24528]: dsi_disconnect: entering disconnected state > Aug 31 12:46:19 mini afpd[24528]: dsi_wrtreply: Bad file descriptor > Aug 31 12:46:19 mini afpd[24528]: dsi_disconnect: entering disconnected state > > This one is from yesterday: > > Sep 5 16:10:50 mini ntpd[6258]: 2 out of 4 peers valid > Sep 5 16:10:50 mini ntpd[6258]: bad peer from pool pool.ntp.org > (46.17.142.10) > Sep 5 16:10:50 mini ntpd[6258]: bad peer from pool pool.ntp.org > (194.140.131.21) > > > I then try to grep on /var/log for timestamps which are close to that date, > but there are no other error messages. > > The machine is running headless so I can't see if there are any error > messages on screen. > > >Fix: > > ifconfig gem0 down && ifconfig gem0 up > > As to a permanent fix, here are some hyphotheses: > > - It is clearly a network issue, since it's solved by an ifconfig down+up > - It is probably something driver-related, since I googled and looked at the > mailing lists, and there is nobody experiencing the same issue. I guess there > are few people using OpenBSD on a G4 with the gem0 driver, so this may be an > untested corner case of the driver. If it were a system-wide issue, somebody > else would probably have noticed it. > - This may be a data overflow. It can be either in a counter of absolute > tx/rx data, or number of connections. The weird weekly periodicity has > probably something to do with it. Or maybe connections aren't properly > cleaned up and eventually they fill up some buffer? This is my best guess > - It does not seem to affect the kernel/other processes since there are no > dmesg messages and the system doesn't require a restart. > > > Can anybody give me more pointers to further narrow down the issue?
I cant help you on the issue itself, but i can confirm you that i've been seeing the exact same issue with gem0 on my g4 mac mini here, and since some releases. randomly, gem0 just doesnt receive/send pkts anymore and needs to be downed/upped. Landry
