Nuutti,
Thanks very much for these dumps and this config. Pretty informative.
Here are some debugging suggestions.
(0) This distinctly looks like memory corruption, possibly within ToDevice. I
will look at Queue itself, as well, but this seems like an unlikely source of
problems, since your Click is not installed with --enable-multithread.
(1) Perhaps the problem is with EtherSwitch, whose internal hash table may be
causing problems in SMP settings. Can you try again, replacing the
EtherSwitch element with a Hub element? This will do the same job, but
without a table. My expectation is this will also fail.
(2) To narrow down the problem, we can try very simple ToDevice and Queue
configs. This would involve:
- ia32
- either patch or fixincludes
- SMP kernel
- The following configs:
InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> ToDevice(eth0);
-*- OR
InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> Queue
-> ToDevice(eth0);
-*- OR
InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> ToDevice(eth0);
InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> ToDevice(eth1);
-*- OR
InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> Queue
-> ToDevice(eth0);
InfiniteSource(DATA \<plausible-data-for-an-ethernet-packet>)
-> Queue
-> ToDevice(eth1);
------
These configs test ToDevice with and without Queues, and with and without
accessing two devices.
We'll look in parallel, but I'm interested in what you see.
Eddie
Nuutti Varis wrote:
> Hey,
>
> While trying to run throughput measurements with Click in a kernel, running a
> simple EtherSwitch configuration (attached as etherswitch.click) in a
> topology of:
>
> EndHostA::ethI0 <==> ethI0::EtherSwitch1::ethI1 <==>
> ethI1::EtherSwitch2::ethI0 <==> ethI0::EndHostB
> 192.168.2.1
> --------------------------------------------------------------------------->
> 192.168.2.2
> FastUDPSrc w/ 64B packet, 300kpp/s
>
> I stumbled upon a kernel crash, seemingly when the Queue elements started
> dropping packets due to overflow. I tried this with two different kernel
> versions (2.6.31.12 and 2.6.24.7) and with either 2.6.24.7 manual patch, or
> with --enable-fixincludes. Interestingly, the kernel crash does not happen
> when I disable SMP from the kernel. Additionally, normal linux bridging does
> not crash the kernel on overflows. Partial/full crash dumps as attachments
> from various days of testing.
>
> Configuration stuff of the EtherSwitch{1,2}:
> - Dumps arch indicated in the filename, either amd64 or ia32
> - MTU of ethI1 is 1540 (tried with 1500 as well, no difference)
> - Click is configured with --enable-linuxmodule --enable-userlevel
> --enable-etherswitch [--enable-fixincludes]
> - Kernel does not have any pre-empting enabled.
> - Both e1000e poll-patched and vanilla cause the problem
> - e1000e versions 0.4.1.7 and 1.0.2-k2 (comes with 2.6.31.12) cause the
> problem
>
>
>
>
> ------------------------------------------------------------------------
>
>
>
>
> --
> Nuutti Varis ([email protected])
> PhD Student, Aalto University School of Science and Technology
> Department of Communications and Networking
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> click mailing list
> [email protected]
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
_______________________________________________
click mailing list
[email protected]
https://amsterdam.lcs.mit.edu/mailman/listinfo/click