Re: [Cerowrt-devel] 3.3.6-2

Sebastian Moeller Sat, 02 Jun 2012 00:03:54 -0700

Hi Robert,

tool me some time to get a bit further with more testing...


On May 25, 2012, at 3:38 PM, Robert Bradley wrote:

> On 25/05/12 19:25, Sebastian Moeller wrote:
>> Hi Robert,
>> 
>> 
>> On May 25, 2012, at 4:11 AM, Robert Bradley wrote:
>> 
>>> That said, unless we can
>>> find an obvious reason for /tmp overfilling, I'm not sure we should do
>>> that, since it will cause problems upgrading.
>>      But if I create a file of 30000 1KB blocks in /tmp (so that around 400 
>> KB stay available), the router goes into OOM, so I do not think that 
>> upgrading would work well if it really needs so much memory? I have a hunch 
>> that the openwork base under cerowrt does not assume something as big and 
>> demanding as the 11MB bind9 named process running :)
> The flash memory size is about 16MB for the WNDR3700, so it's probably ok for 
> normal use.  It's less certain with BIND and everything else running, 
> although it'd be possible to restart the router, stop BIND and then update.

        From my totally unscientific testing I am quite convinced that even 
16MB of /tmp used will make the router spiral into reboot if used over the 5GHz 
radio to the wan port. However, if I use one of the wired ports I get plenty of 
the following (not always hostapd):


Jun  1 23:41:08 nacktmulle kern.warn kernel: [185428.417968] hostapd: page 
allocation failure: order:0, mode:0x4020
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] Call Trace:
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<802850a4>] 
dump_stack+0x8/0x34
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800b4548>] 
warn_alloc_failed+0xe8/0x10c
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800b684c>] 
__alloc_pages_nodemask+0x5a0/0x600
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800da070>] 
new_slab+0xa8/0x280
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<80286b18>] 
__slab_alloc.isra.60.constprop.63+0x25c/0x2fc
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800dba48>] 
__kmalloc_track_caller+0x88/0x140
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e0854>] 
__alloc_skb+0x80/0x140
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e0930>] 
dev_alloc_skb+0x1c/0x48
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801d0c74>] 
ag71xx_poll+0x430/0x65c
Jun  1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e8c10>] 
net_rx_action+0x88/0x1c8
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] hostapd: page 
allocation failure: order:0, mode:0x4020
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Call Trace:
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<802850a4>] 
dump_stack+0x8/0x34
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800b4548>] 
warn_alloc_failed+0xe8/0x10c
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800b684c>] 
__alloc_pages_nodemask+0x5a0/0x600
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800da070>] 
new_slab+0xa8/0x280
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<80286b18>] 
__slab_alloc.isra.60.constprop.63+0x25c/0x2fc
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800dba48>] 
__kmalloc_track_caller+0x88/0x140
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801e0854>] 
__alloc_skb+0x80/0x140
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801e0930>] 
dev_alloc_skb+0x1c/0x48
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801d0c74>] 
ag71xx_poll+0x430/0x65c
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Mem-Info:
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal per-cpu:
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] CPU    0: hi:   
18, btch:   3 usd:  18
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] active_anon:3826 
inactive_anon:63 isolated_anon:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  active_file:683 
inactive_file:561 isolated_file:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  unevictable:0 
dirty:0 writeback:0 unstable:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  free:96 
slab_reclaimable:408 slab_unreclaimable:7706
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]  mapped:501 
shmem:109 pagetables:142 bounce:0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal free:384kB 
min:1016kB low:1268kB high:1524kB active_anon:15304kB inactive_anon:252kB 
active_file:2732kB inactive_file:2244kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB present:65024kB mlocked:0k
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] lowmem_reserve[]: 
0 0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal: 42*4kB 
15*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 
= 384kB
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1353 total 
pagecache pages
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 0 pages in swap 
cache
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Swap cache stats: 
add 0, delete 0, find 0/0
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Free swap  = 0kB
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Total swap = 0kB
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 16384 pages RAM
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 965 pages reserved
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1399 pages shared
Jun  1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 14306 pages 
non-shared
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] SLUB: Unable to 
allocate memory on node -1 (gfp=0x20)
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375]   cache: 
kmalloc-2048, object size: 2048, buffer size: 2048, default order: 2, min 
order: 0
Jun  1 23:41:09 nacktmulle kern.warn kernel: [185429.484375]   node 0: slabs: 
0, objs: 0, free: 0

But the box seems to survive this… Heck this even survives my test case with 
16000 KB used of /tmp. Under that amount of memory pressure named and ntpd get 
killed but the router does go into automatically reboot, it just stays up and 
running albeit somewhat useless without named.



> 
>>      Oh I agree the /tmp issue is a tangent, but it does not seem healthy 
>> that the router spirals into reboot once /tmp fills up (BTW if I remove my 
>> 30000KB file from /tmp while the first OOM is in process the router 
>> recovers) My hunch is that the falmost fully instantiated tmpfs takes to o 
>> much memory from the system for it to handle its usual business.
>>      On top of that are the wireless issues, say what about a kernel memory 
>> leak caused by ath wireless that grows and grows until the problematic /tmp 
>> size is in the single digit MBs that starts the spiral to reboot?
> 
> No, definitely not healthy!  I'm thinking that maybe setting tmpfs to 20MB 
> would be a good compromise, at least until the presumed memory leak can be 
> tracked down.

        The way I interpret my latest test results is that the "assumed leak" 
should be restricted to the wireless driver, does that sound right to you? Also 
with cerowrt 3.3.6-2 even 16MB seem to much for /tmp. I will see what happens 
if I add some swap space to the router, I hope it will be quite happy with 31MB 
/tmp and actual usage of that space :). Since Dave only recommends full tftp 
reflashes  maybe the update scenario might not be such a big issue for cerowrt?

> 
>>> I'm thinking that maybe flooding wireless->wired with UDP traffic for
>>> 5-10 minutes is the right approach, and then vice-versa (restarting
>>> the router inbetween?).  If there are problems like infinite retries
>>> or packet memory leaks, that might show them up quickly.
>>      That sounds like the right way to process, except I am no expert at 
>> setting netsurf up so that might take a while until I get around to actually 
>> test that hypothesis. (Do you by any chance know a publicly available net 
>> server process running in the internets to which I could point a local 
>> netperf, and do you have any recommendations how to create the UDP flood 
>> with netperf ?)
>> 
>> 
> 
> I don't know of any myself.  There's a possible tutorial on setting it up at 
> http://www.tonymacx86.com/viewtopic.php?t=5700, but assuming you have it 
> installed on two computers already, it should just be a case of running:
> 
> user@computer1$ netperf -t UDP_STREAM -H computer2
> 
> and possibly running "netserver -p 12865" on computer2 if necessary.  (It 
> should in theory be started via inetd.)


        I am still trying to get a second machine on my network so I can test 
the UDP hypothesis, but that will take a while longer…

Best
        Sebastian

_______________________________________________
Cerowrt-devel mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Re: [Cerowrt-devel] 3.3.6-2

Reply via email to