Hi Robert,
tool me some time to get a bit further with more testing...
On May 25, 2012, at 3:38 PM, Robert Bradley wrote:
> On 25/05/12 19:25, Sebastian Moeller wrote:
>> Hi Robert,
>>
>>
>> On May 25, 2012, at 4:11 AM, Robert Bradley wrote:
>>
>>> That said, unless we can
>>> find an obvious reason for /tmp overfilling, I'm not sure we should do
>>> that, since it will cause problems upgrading.
>> But if I create a file of 30000 1KB blocks in /tmp (so that around 400
>> KB stay available), the router goes into OOM, so I do not think that
>> upgrading would work well if it really needs so much memory? I have a hunch
>> that the openwork base under cerowrt does not assume something as big and
>> demanding as the 11MB bind9 named process running :)
> The flash memory size is about 16MB for the WNDR3700, so it's probably ok for
> normal use. It's less certain with BIND and everything else running,
> although it'd be possible to restart the router, stop BIND and then update.
From my totally unscientific testing I am quite convinced that even
16MB of /tmp used will make the router spiral into reboot if used over the 5GHz
radio to the wan port. However, if I use one of the wired ports I get plenty of
the following (not always hostapd):
Jun 1 23:41:08 nacktmulle kern.warn kernel: [185428.417968] hostapd: page
allocation failure: order:0, mode:0x4020
Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] Call Trace:
Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<802850a4>]
dump_stack+0x8/0x34
Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800b4548>]
warn_alloc_failed+0xe8/0x10c
Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800b684c>]
__alloc_pages_nodemask+0x5a0/0x600
Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800da070>]
new_slab+0xa8/0x280
Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<80286b18>]
__slab_alloc.isra.60.constprop.63+0x25c/0x2fc
Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<800dba48>]
__kmalloc_track_caller+0x88/0x140
Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e0854>]
__alloc_skb+0x80/0x140
Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e0930>]
dev_alloc_skb+0x1c/0x48
Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801d0c74>]
ag71xx_poll+0x430/0x65c
Jun 1 23:41:08 nacktmulle kern.alert kernel: [185428.417968] [<801e8c10>]
net_rx_action+0x88/0x1c8
Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] hostapd: page
allocation failure: order:0, mode:0x4020
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Call Trace:
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<802850a4>]
dump_stack+0x8/0x34
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800b4548>]
warn_alloc_failed+0xe8/0x10c
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800b684c>]
__alloc_pages_nodemask+0x5a0/0x600
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800da070>]
new_slab+0xa8/0x280
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<80286b18>]
__slab_alloc.isra.60.constprop.63+0x25c/0x2fc
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<800dba48>]
__kmalloc_track_caller+0x88/0x140
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801e0854>]
__alloc_skb+0x80/0x140
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801e0930>]
dev_alloc_skb+0x1c/0x48
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] [<801d0c74>]
ag71xx_poll+0x430/0x65c
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375]
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Mem-Info:
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal per-cpu:
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] CPU 0: hi:
18, btch: 3 usd: 18
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] active_anon:3826
inactive_anon:63 isolated_anon:0
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] active_file:683
inactive_file:561 isolated_file:0
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] unevictable:0
dirty:0 writeback:0 unstable:0
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] free:96
slab_reclaimable:408 slab_unreclaimable:7706
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] mapped:501
shmem:109 pagetables:142 bounce:0
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal free:384kB
min:1016kB low:1268kB high:1524kB active_anon:15304kB inactive_anon:252kB
active_file:2732kB inactive_file:2244kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:65024kB mlocked:0k
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] lowmem_reserve[]:
0 0
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Normal: 42*4kB
15*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB
= 384kB
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1353 total
pagecache pages
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 0 pages in swap
cache
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Swap cache stats:
add 0, delete 0, find 0/0
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Free swap = 0kB
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] Total swap = 0kB
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 16384 pages RAM
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 965 pages reserved
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 1399 pages shared
Jun 1 23:41:09 nacktmulle kern.alert kernel: [185429.484375] 14306 pages
non-shared
Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] SLUB: Unable to
allocate memory on node -1 (gfp=0x20)
Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] cache:
kmalloc-2048, object size: 2048, buffer size: 2048, default order: 2, min
order: 0
Jun 1 23:41:09 nacktmulle kern.warn kernel: [185429.484375] node 0: slabs:
0, objs: 0, free: 0
But the box seems to survive this… Heck this even survives my test case with
16000 KB used of /tmp. Under that amount of memory pressure named and ntpd get
killed but the router does go into automatically reboot, it just stays up and
running albeit somewhat useless without named.
>
>> Oh I agree the /tmp issue is a tangent, but it does not seem healthy
>> that the router spirals into reboot once /tmp fills up (BTW if I remove my
>> 30000KB file from /tmp while the first OOM is in process the router
>> recovers) My hunch is that the falmost fully instantiated tmpfs takes to o
>> much memory from the system for it to handle its usual business.
>> On top of that are the wireless issues, say what about a kernel memory
>> leak caused by ath wireless that grows and grows until the problematic /tmp
>> size is in the single digit MBs that starts the spiral to reboot?
>
> No, definitely not healthy! I'm thinking that maybe setting tmpfs to 20MB
> would be a good compromise, at least until the presumed memory leak can be
> tracked down.
The way I interpret my latest test results is that the "assumed leak"
should be restricted to the wireless driver, does that sound right to you? Also
with cerowrt 3.3.6-2 even 16MB seem to much for /tmp. I will see what happens
if I add some swap space to the router, I hope it will be quite happy with 31MB
/tmp and actual usage of that space :). Since Dave only recommends full tftp
reflashes maybe the update scenario might not be such a big issue for cerowrt?
>
>>> I'm thinking that maybe flooding wireless->wired with UDP traffic for
>>> 5-10 minutes is the right approach, and then vice-versa (restarting
>>> the router inbetween?). If there are problems like infinite retries
>>> or packet memory leaks, that might show them up quickly.
>> That sounds like the right way to process, except I am no expert at
>> setting netsurf up so that might take a while until I get around to actually
>> test that hypothesis. (Do you by any chance know a publicly available net
>> server process running in the internets to which I could point a local
>> netperf, and do you have any recommendations how to create the UDP flood
>> with netperf ?)
>>
>>
>
> I don't know of any myself. There's a possible tutorial on setting it up at
> http://www.tonymacx86.com/viewtopic.php?t=5700, but assuming you have it
> installed on two computers already, it should just be a case of running:
>
> user@computer1$ netperf -t UDP_STREAM -H computer2
>
> and possibly running "netserver -p 12865" on computer2 if necessary. (It
> should in theory be started via inetd.)
I am still trying to get a second machine on my network so I can test
the UDP hypothesis, but that will take a while longer…
Best
Sebastian
_______________________________________________
Cerowrt-devel mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel