On Wed, Jan 29, 2014 at 4:45 AM, Sebastian Moeller <[email protected]> wrote: > Hi Dave, > > quick question, how does one turn of logging for babeld? It seems that if > daemonized it defaults to logging to /var/log/babeld.log (or similar). Is > setting the log file to /dev/null really the answer?
seems so. > (Since I have no the IPv6 issue not yet resolved, I assume babeld is > unhappy) I resorted to stopping babeld completely, but that feels like a > crutch... no daemon in an embedded system should ever write to flash in an uncontrollable manner. I will also argue that not being able to find the channel is a bug that messes with diversity routing in particular. > > Best > Sebastian > > > > On Jan 27, 2014, at 22:14 , Dave Taht <[email protected]> wrote: > >> certainly turn off the babeld log! I will leave it off in the next release. >> >> On Mon, Jan 27, 2014 at 4:10 PM, Steve Jenson <[email protected]> wrote: >>> Looking more, the buffer errors are showing up in syslog well before tmpfs >>> fills up. Is the memtester openwrt package available for cerowrt? I don't >>> see it under `Available packages`. >>> >>> Thanks, >>> Steve >>> >>> >>> On Mon, Jan 27, 2014 at 1:06 PM, Steve Jenson <[email protected]> wrote: >>>> >>>> On Fri, Jan 24, 2014 at 3:23 PM, Dave Taht <[email protected]> wrote: >>>>> >>>>> On Fri, Jan 24, 2014 at 6:08 PM, Steve Jenson <[email protected]> >>>>> wrote: >>>>>> Hi everybody, >>>>>> >>>>>> I've been using cerowrt as a secondary wifi network (just a single AP >>>>>> for >>>>>> now) for a few weeks now. Recently, my wndr3800 got stuck in a bad >>>>>> state and >>>>>> eventually rebooted. I've had this happen a few times now and am >>>>>> looking for >>>>>> ways to debug the issue. I'm new to cerowrt and openwrt so any advice >>>>>> is >>>>>> appreciated. >>>>>> >>>>>> Since I use it as a secondary network, this is no way critical. >>>>> >>>>> Yea! I appreciate caution before putting alpha software on your gw. >>>>> >>>>>> I'm not >>>>>> looking for free tech support but I couldn't find anything on the wiki >>>>>> about >>>>>> troubleshooting. I'd love to start a page and write some shell scripts >>>>>> to >>>>>> diagnose and report issues. I know that a cerowrt router is meant to be >>>>>> a >>>>>> research project rather a consumer device but these things seem helpful >>>>>> regardless. >>>>> >>>>> Sure, let me know your wiki account. I have been lax about granting >>>>> access of late as the signup process is overrun by spammers. >>>> >>>> >>>> My username is stevej on the wiki. Thanks! >>>> >>>> >>>>> >>>>>> Please let me know if you'd prefer I not email the list with these >>>>>> issues or >>>>>> if you'd rather I used trac or a different forum. >>>>>> >>>>> >>>>> The list is where most stuff happens. Also in the irc channel. >>>>> >>>>> If it gets to where it needs to be tracked we have a bugtracker at >>>>> >>>>> http://www.bufferbloat.net/projects/cerowrt/issues >>>>> >>>>> The first question I have is: Are you on comcast? Cerowrt >>>>> had a dhcpv6-pd implementation that "just worked" from feburary through >>>>> december. Regrettably they changed the RA announcement interval >>>>> to a really low number around then... and this triggers a firewall reload >>>>> every minute on everything prior to the release I point to below. >>>>> >>>>> If there is a memory leak somewhere that would have triggered it. >>>> >>>> >>>> I am on AT&T ADSL2+ with a Motorola NVG510 modem. >>>> >>>> >>>>> >>>>>> In this state, I can connect to the cerowrt base station via wifi but >>>>>> am >>>>>> unable to route packets to the internet. I can connect to :81 and see >>>>>> the >>>>>> login page but logging in results in a lua error at `/cgi-bin/luci` >>>>>> >>>>>> >>>>>> /usr/lib/lua/luci/dispatcher.lua:448: Failed to execute function >>>>>> dispatcher target for entry '/'. >>>>>> The called action terminated with an exception: >>>>>> /usr/lib/lua/luci/sauth.lua:87: Session data invalid! >>>>>> stack traceback: >>>>>> [C]: in function 'assert' >>>>>> /usr/lib/lua/luci/dispatcher.lua:448: in function 'dispatch' >>>>>> /usr/lib/lua/luci/dispatcher.lua:195: in function >>>>>> </usr/lib/lua/luci/dispatcher.lua:194> >>>>>> >>>>>> I can ssh into the device and cat various log files until the router >>>>>> hangs >>>>>> and reboots. here's a few relevant lines from my terminal history >>>>>> before the >>>>>> device rebooted (I'm assuming a watchdog kicked in and rebooted it). >>>>>> >>>>>> root@buffy2-1:~# ping google.com >>>>>> ping: bad address 'google.com' >>>>>> root@buffy2-1:~# free >>>>>> total used free shared buffers >>>>>> Mem: 126336 110332 16004 0 5616 >>>>>> -/+ buffers: 104716 21620 >>>>>> Swap: 0 0 0 >>>>>> root@buffy2-1:~# uptime >>>>>> 02:08:54 up 2 days, 1:26, load average: 0.10, 0.21, 0.17 >>>>>> root@buffy2-1:~# dmesg >>>>>> [ 0.000000] Linux version 3.10.24 (cero2@snapon) (gcc version 4.6.4 >>>>>> (OpenWrt/Linaro GCC 4.6-2013.05 r38226) ) #1 Tue Dec 24 >>>>>> 10:50:15 PST 2013 >>>>>> [skipping some lines] >>>>>> >>>>>> [ 13.156250] Error: Driver 'gpio-keys-polled' is already registered, >>>>>> aborting... >>>>>> [ 19.414062] IPv6: ADDRCONF(NETDEV_UP): ge00: link is not ready >>>>>> [ 19.421875] ar71xx: pll_reg 0xb8050010: 0x11110000 >>>>>> [ 19.429687] se00: link up (1000Mbps/Full duplex) >>>>>> [ 22.140625] IPv6: ADDRCONF(NETDEV_UP): sw00: link is not ready >>>>>> [ 23.351562] IPv6: ADDRCONF(NETDEV_CHANGE): sw00: link becomes ready >>>>>> [ 23.757812] ar71xx: pll_reg 0xb8050014: 0x11110000 >>>>>> [ 23.757812] ge00: link up (1000Mbps/Full duplex) >>>>>> [ 23.773437] IPv6: ADDRCONF(NETDEV_CHANGE): ge00: link becomes ready >>>>>> >>>>>> root@buffy2-1:~# ifconfig >>>>>> ge00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B1 >>>>>> inet addr:192.168.1.138 Bcast:192.168.1.255 >>>>>> Mask:255.255.255.0 >>>>>> inet6 addr: fe80::2eb0:5dff:fea0:c5b1/64 Scope:Link >>>>>> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 >>>>>> Scope:Global >>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>>>> RX packets:1469670 errors:0 dropped:8 overruns:0 frame:0 >>>>>> TX packets:547733 errors:0 dropped:0 overruns:0 carrier:0 >>>>>> collisions:0 txqueuelen:1000 >>>>>> RX bytes:229243410 (218.6 MiB) TX bytes:57304808 (54.6 MiB) >>>>>> Interrupt:5 >>>>>> >>>>>> lo Link encap:Local Loopback >>>>>> inet addr:127.0.0.1 Mask:255.0.0.0 >>>>>> inet6 addr: ::1/128 Scope:Host >>>>>> UP LOOPBACK RUNNING MTU:65536 Metric:1 >>>>>> RX packets:23689 errors:0 dropped:0 overruns:0 frame:0 >>>>>> TX packets:23689 errors:0 dropped:0 overruns:0 carrier:0 >>>>>> collisions:0 txqueuelen:0 >>>>>> RX bytes:2612713 (2.4 MiB) TX bytes:2612713 (2.4 MiB) >>>>>> >>>>>> pimreg Link encap:UNSPEC HWaddr >>>>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 >>>>>> UP RUNNING NOARP MTU:1472 Metric:1 >>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>>>>> collisions:0 txqueuelen:0 >>>>>> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) >>>>>> >>>>>> se00 Link encap:Ethernet HWaddr 2E:B0:5D:A0:C5:B0 >>>>>> inet addr:172.30.42.1 Bcast:172.30.42.31 >>>>>> Mask:255.255.255.224 >>>>>> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 >>>>>> Scope:Global >>>>> >>>>> How are you assigning your ipv6 addresses? >>>> >>>> >>>> It's been a while since I messed with this but I think IPv6 is assigned >>>> thanks to 6relayd? My modem has IPv6 enabled but no DHCPv6 options that I >>>> can find. Here's how cerowrt is configured. >>>> >>>> root@buffy2-1:/overlay/etc/config# cat 6relayd >>>> config server 'default' >>>> option fallback_relay 'rd dhcpv6 ndp' >>>> list network 'ge00' >>>> list network 'ge01' >>>> list network 'gw00' >>>> list network 'gw01' >>>> list network 'gw10' >>>> list network 'gw11' >>>> list network 'se00' >>>> list network 'sw00' >>>> list network 'sw10' >>>> option rd 'relay' >>>> option dhcpv6 'relay' >>>> option ndp 'relay' >>>> option master 'ge00' >>>> >>>>> >>>>>> inet6 addr: fe80::2cb0:5dff:fea0:c5b0/64 Scope:Link >>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>> TX packets:191740 errors:0 dropped:0 overruns:0 carrier:0 >>>>>> collisions:0 txqueuelen:1000 >>>>>> RX bytes:0 (0.0 B) TX bytes:42184988 (40.2 MiB) >>>>>> Interrupt:4 >>>>>> >>>>>> sw00 Link encap:Ethernet HWaddr 2C:B0:5D:A0:C5:B0 >>>>>> inet addr:172.30.42.65 Bcast:172.30.42.95 >>>>>> Mask:255.255.255.224 >>>>>> inet6 addr: 2602:30a:2cdb:330:2eb0:5dff:fea0:c5b1/64 >>>>>> Scope:Global >>>>>> inet6 addr: fe80::2eb0:5dff:fea0:c5b0/64 Scope:Link >>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>>>> RX packets:70239 errors:0 dropped:0 overruns:0 frame:0 >>>>>> TX packets:286967 errors:0 dropped:0 overruns:0 carrier:0 >>>>>> collisions:0 txqueuelen:1000 >>>>>> RX bytes:15590189 (14.8 MiB) TX bytes:127357293 (121.4 MiB) >>>>>> >>>>>> root@buffy2-1:~# less /var/log/babeld.log >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> send: Cannot assign requested address >>>>>> send: Cannot assign requested address >>>>>> send: Cannot assign requested address >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> netlink_read: recvmsg(): No buffer space available >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>>> Couldn't determine channel of interface sw00: Invalid argument. >>>>> >>>>> This is a problem in babel detecting the channel on a "normal" >>>>> rather than a mesh interface. It's bugged me a long while, but >>>>> haven't got around to finding what triggers it. Might "fix" it by >>>>> acquiring the channel at babel start time from /etc/config/wireless. >>>>> >>>>> It messes up the diversity routing calculation, grump. >>>>> >>>>> There is a possibility a logfile got really big, but this one >>>>> generally doesn't, but I should turn off logging in some >>>>> future release... >>>> >>>> >>>> I believe I've tracked down part of what's going on. It looks like my >>>> tmpfs is filling up 100% and then the device enters a bad state: >>>> >>>> After 24 hours, with tmpfs at 50%, babeld.log is the largest file by far >>>> in tmpfs and the only file that appears to be growing (based on `du`). It >>>> takes about 48 hours from reboot to fill up tmpfs on my device. >>>> >>>> # sort babeld.log | uniq -c |sort -rn |head >>>> >>>> 503236 Couldn't determine channel of interface sw00: Invalid argument. >>>> >>>> 1376 netlink_read: recvmsg(): No buffer space available >>>> >>>> 3 send: Cannot assign requested address >>>> >>>> # wc -l babeld.log >>>> >>>> 504617 babeld.log >>>> >>>> I sped up system failure by using `dd` to fill up tmpfs and the system >>>> became immediately unusable. >>>> >>>> This also explains the luci session store errors as sessions are stored in >>>> tmpfs. >>>> >>>> The other buffer issues may or may not be related to this. >>>> >>>> Best, >>>> Steve >>> >>> >> >> >> >> -- >> Dave Täht >> >> Fixing bufferbloat with cerowrt: >> http://www.teklibre.com/cerowrt/subscribe.html >> _______________________________________________ >> Cerowrt-devel mailing list >> [email protected] >> https://lists.bufferbloat.net/listinfo/cerowrt-devel > -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html _______________________________________________ Cerowrt-devel mailing list [email protected] https://lists.bufferbloat.net/listinfo/cerowrt-devel
