On 01/08/15 00:56, Rich Brown wrote:
Folks,

I would like some suggestions for debugging a problem I have with CeroWrt.

I have deployed CeroWrt 3.10.50-1 on two WNDR3800's at a hospitality business 
nearby. These routers have worked fine in my house in the past. WNDR3800 #1 
talks to my DSL modem (wifi disabled), and WNDR3800 #2 has its WAN wired to the 
LAN side of #1 (routed, no NAT). I also have a third router (Netgear something 
or another, running stock firmware and NAT) with its WAN port wired to WNDR3800 
#2 LAN, at the far end of the property. While in operation, they work as 
expected, and fq_codel is doing its job (also as expected). The setup - all 
dashed lines are Ethernet:

[ Internet ] --- [Fairpoint DSL Modem] --- [WNDR3800 #1] --- [WNDR3800 #2] --- 
[Netgear ?]

The problem is that the Wifi locks up on either/both WNDR3800's after a while 
(a day or so). Guests complain that they cannot connect to the wifi. If the 
innkeepers reboot the router, Presto! it's fine for a while longer.

I have only been present once when it was in the stuck state, and wired access 
to/through the WNDR3800 #1 was fine. My Macbook was *not* able to get a 
connection through wifi, but both Wifi Explorer on the Mac and Wifi-Analyzer on 
android could see a healthy signal level (and no overlapping channels) on the 
expected channel. Here's the wifi setup:

- I only have one interface on each of the 2.4 and 5 GHz radios. (I turn off 
babel and the other wifi channel)
- All SSIDs (on each of the routers) are the same string "Loch Lyme Lodge"
-The wifi channels are different (1, 6, 11 for 2.4GHz, 36 & 44 for 5GHz) for 
all the routers

My questions:

- Any thoughts about what might be causing this?

Sorry to hear that Rich.  Be prepared to give up :).

My brother's router is immediately allergic to one of his wifi devices (not sure if the effect was limited to wifi though). That's the variable I'd instinctively blame - wifi driver / hardware and "incompatibility" bugs. Two incompatibilities I've seen were "known problems". If it happens with the original firmware on a popular device, there's likely a report of it online somewhere, though not necessarily a fix.

I wouldn't know how to fix it. If my instinct is right you ideally want to reproduce the exact chipset that breaks the AP. Which I wouldn't know how to check unless I could pin it down to a laptop and look at that :(. Don't know about phones.

Since the "signal" stays up, you can't even run it in parallel with an automatic fallback. A manual poweroff would still be required.

- What should I look for (log files, symptoms, etc) next time I get the word 
that it has happened?

Many thanks!

Rich

Given your symptoms, you could see if the hostapd process has crashed and isn't running any more (in "ps"), or is looping (100% cpu in "top"). Unfortunately procd doesn't seem to log daemon deaths.

At the most basic level you could make sure connection logs are enabled in the wpa supplicant (seems so by default) and perhaps send them somewhere permanent[1]. Logs are always nice. It logs the device's unique MAC. Fwiw you could then look up the MAC online to see the "OUI" - the vendor e.g. Broadcom.

Thought: to confirm exact failure times, leave an old phone / raspberry-PI w/wifi plugged in with <waves hand vaguely> a ping monitor. On the AP using a usb to avoid filling the nand? "mount /mnt/usb-stick; cd /mnt/usb-stick; nohup ping >>ping.log &".

* nohup may require installing coreutils-nohup
** coreutils-nohup not present in cero package list :'(. Maybe try grabbing packages from a matching version of openwrt.

[1] syslog to usb: http://wiki.openwrt.org/doc/howto/log.essentials#output

I guess you'd want the same "nohup CMD>>logfile &" treatment with the command they suggest, put in the /etc/rc.local boot script. The same "logread" will also show any default-enabled messages from the kernel.

Alan
_______________________________________________
Cerowrt-devel mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Reply via email to