Hey,

As predicted, the commit does not do anything in this case, 2.6.31.12 warns and 
2.6.24.7 crashes with identical traces as before. The test cases for ToDevice 
do not cause any crashes, best I could get were warnings on 2.6.31.12 
(attached) and nothing out of the ordinary on 2.6.24.7 (manual patch). This 
happened in the case of "InfiniteSource -> [Queue ->] ToDevice" (single device).

However, there is something interesting that I found out. Apparently, the newer 
e1000e driver in 2.6.31.12, as well as the e1000e-0.4.1.7 driver both use NAPI 
by default. To rule that out, I recompiled the 0.4.1.7 driver to a manually 
patched 2.6.24.7 kernel. And the result is .. no kernel crash in the original 
setup (endhost - switch - switch - endhost). In hindsight, should have tried 
this first :)

On Feb 10, 2010, at 10:36 PM, Eddie Kohler wrote:

> Hi Nuutti,
> 
> There is a small chance this commit may fix your issue:
> 
> http://www.read.cs.ucla.edu/gitweb?p=click;a=commit;h=01c8f4e084036338e83a6bff7a8e74dc49caa014
> 
> If it does not, I think we need more input from you to narrow it down...
> 
> Thanks so much,
> Eddie
> 
> 
> Eddie Kohler wrote:
>> Nuutti,
>> Thanks very much for these dumps and this config.  Pretty informative.
>> Here are some debugging suggestions.
>> (0) This distinctly looks like memory corruption, possibly within ToDevice.  
>> I will look at Queue itself, as well, but this seems like an unlikely source 
>> of problems, since your Click is not installed with --enable-multithread.
>> (1) Perhaps the problem is with EtherSwitch, whose internal hash table may 
>> be causing problems in SMP settings.  Can you try again, replacing the 
>> EtherSwitch element with a Hub element?  This will do the same job, but 
>> without a table.  My expectation is this will also fail.
>> (2) To narrow down the problem, we can try very simple ToDevice and Queue 
>> configs.  This would involve:
>> - ia32
>> - either patch or fixincludes
>> - SMP kernel
>> - The following configs:
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> ToDevice(eth0);
>> -*- OR
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> Queue
>> -> ToDevice(eth0);
>> -*- OR
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> ToDevice(eth0);
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> ToDevice(eth1);
>> -*- OR
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> Queue
>> -> ToDevice(eth0);
>> InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
>> -> Queue
>> -> ToDevice(eth1);
>> ------
>> These configs test ToDevice with and without Queues, and with and without 
>> accessing two devices.
>> We'll look in parallel, but I'm interested in what you see.
>> Eddie
>> Nuutti Varis wrote:
>>> Hey, 
>>> While trying to run throughput measurements with Click in a kernel, running 
>>> a simple EtherSwitch configuration (attached as etherswitch.click) in a 
>>> topology of:
>>> 
>>> EndHostA::ethI0 <==> ethI0::EtherSwitch1::ethI1 <==> 
>>> ethI1::EtherSwitch2::ethI0 <==> ethI0::EndHostB
>>> 192.168.2.1 
>>> --------------------------------------------------------------------------->
>>>  192.168.2.2
>>> FastUDPSrc w/ 64B packet, 300kpp/s
>>> 
>>> I stumbled upon a kernel crash, seemingly when the Queue elements started 
>>> dropping packets due to overflow. I tried this with two different kernel 
>>> versions (2.6.31.12 and 2.6.24.7) and with either 2.6.24.7 manual patch, or 
>>> with --enable-fixincludes. Interestingly, the kernel crash does not happen 
>>> when I disable SMP from the kernel. Additionally, normal linux bridging 
>>> does not crash the kernel on overflows. Partial/full crash dumps as 
>>> attachments from various days of testing.
>>> 
>>> Configuration stuff of the EtherSwitch{1,2}:
>>> - Dumps arch indicated in the filename, either amd64 or ia32
>>> - MTU of ethI1 is 1540 (tried with 1500 as well, no difference)
>>> - Click is configured with --enable-linuxmodule --enable-userlevel 
>>> --enable-etherswitch [--enable-fixincludes]
>>> - Kernel does not have any pre-empting enabled.
>>> - Both e1000e poll-patched and vanilla cause the problem
>>> - e1000e versions 0.4.1.7 and 1.0.2-k2 (comes with 2.6.31.12) cause the 
>>> problem
>>> 
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Nuutti Varis ([email protected])
>>> PhD Student, Aalto University School of Science and Technology
>>> Department of Communications and Networking
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> _______________________________________________
>>> click mailing list
>>> [email protected]
>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>> _______________________________________________
>> click mailing list
>> [email protected]
>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click

Attachment: kernel_warn.100210.linux-2.6.31.12.ia32.enable_fixincludes.dump
Description: Binary data

Attachment: kernel_warn.100210.linux-2.6.31.12.ia32.enable_fixincludes.2.dump
Description: Binary data

--
Nuutti Varis ([email protected])
PhD Student, Aalto University School of Science and Technology
Department of Communications and Networking



_______________________________________________
click mailing list
[email protected]
https://amsterdam.lcs.mit.edu/mailman/listinfo/click

Reply via email to