Hi Roman, Thanks for the E1000E patch link. I'll give that a try.
Here's a copy of what shows up through the serial port when I do click-install up through the point where the system freezes: [ 126.137863] click: starting router thread pid 3920 (ffff81021d5827c0) [ 126.283549] Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: [ 126.289028] [<ffffffff803ca0d7>] pfifo_fast_dequeue+0x48/0x69 [ 126.297312] PGD 21d4c6067 PUD 21b94c067 PMD 0 [ 126.301791] Oops: 0002 [1] SMP [ 126.304957] CPU 6 [ 126.306980] Modules linked in: click proclikefs nls_utf8 nls_cp437 vfat fat nls_base appletalk nfsd auth_rpcgss exportfs n [ 126.360046] Pid: 0, comm: swapper Not tainted 2.6.24.7-click-amd64 #1 [ 126.366471] RIP: 0010:[<ffffffff803ca0d7>] [<ffffffff803ca0d7>] pfifo_fast_dequeue+0x48/0x69 [ 126.374990] RSP: 0018:ffff81021f207eb8 EFLAGS: 00010246 [ 126.380283] RAX: 0000000000000000 RBX: ffff81021b9fa000 RCX: ffff81021c927a80 [ 126.387400] RDX: ffff81021b8b42f0 RSI: ffff81021b9fa9c8 RDI: ffff81021b8b4200 [ 126.394517] RBP: ffff81021b9fa000 R08: 0000000000000000 R09: ffffffff805a4180 [ 126.401635] R10: 0000000000000001 R11: ffff81021f1f5278 R12: 0000000000000000 [ 126.408752] R13: 0000000000000009 R14: ffff81021b9fa300 R15: ffff81021b9fa280 [ 126.415870] FS: 0000000000000000(0000) GS:ffff81021f1b3f40(0000) knlGS:0000000000000000 [ 126.423939] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 126.429672] CR2: 0000000000000008 CR3: 000000021d4b8000 CR4: 00000000000006e0 [ 126.436789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 126.443906] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 126.451015] Process swapper (pid: 0, threadinfo ffff81021f1fe000, task ffff81021f1faa90) [ 126.459074] Stack: ffffffff803ca953 0000000000000040 00000000ffff256e ffff81021b9fa040 [ 126.467134] ffff81021b9fa000 ffff81021b9fa040 0000000000000000 0000000000000009 [ 126.474572] 0000000000000006 0000000000000000 ffffffff803b7b11 ffffffff80517120 [ 126.481820] Call Trace: [ 126.484448] <IRQ> [<ffffffff803ca953>] __qdisc_run+0x94/0x1e5 [ 126.490376] [<ffffffff803b7b11>] net_tx_action+0xbc/0xe4 [ 126.495763] [<ffffffff8023cedd>] __do_softirq+0x5c/0xc2 [ 126.501063] [<ffffffff8020a000>] default_idle+0x0/0x3d [ 126.506278] [<ffffffff8020d0fc>] call_softirq+0x1c/0x28 [ 126.511580] [<ffffffff8020e784>] do_softirq+0x2c/0x7d [ 126.516706] [<ffffffff8023ccbe>] irq_exit+0x3f/0x84 [ 126.521653] [<ffffffff8020e9b0>] do_IRQ+0xb7/0xd4 [ 126.526436] [<ffffffff8020a087>] mwait_idle+0x0/0x45 [ 126.531478] [<ffffffff8020a087>] mwait_idle+0x0/0x45 [ 126.536521] [<ffffffff8020c481>] ret_from_intr+0x0/0xa [ 126.541733] <EOI> [<ffffffff8020a0c9>] mwait_idle+0x42/0x45 [ 126.547486] [<ffffffff8020b0e6>] cpu_idle+0x95/0xde [ 126.552446] [ 126.553936] [ 126.553936] Code: 48 89 50 08 48 c7 01 00 00 00 00 48 c7 41 08 00 00 00 00 8b [ 126.562981] RIP [<ffffffff803ca0d7>] pfifo_fast_dequeue+0x48/0x69 [ 126.569166] RSP <ffff81021f207eb8> [ 126.572651] CR2: 0000000000000008 [ 126.575962] ---[ end trace 32f8f92d27157251 ]--- [ 126.580564] Kernel panic - not syncing: Aiee, killing interrupt handler! I searched for pfifo_fast_dequeue in the Kernel source and I think it wound up being in the networking code. I don't know what else to check right now. -Tom On Mon, Mar 23, 2009 at 10:27 AM, Roman Chertov <[email protected]>wrote: > Tom, > > http://www.mail-archive.com/[email protected]/msg02730.html > This is the e1000e driver that Joonwoo released. When you use it, you > need to use PollDevice instead of FromDevice Click elements. As far as > crashes go, it would help to see the dmesg output when the crash > happens. You might be able to see the messages in /var/log/messages > even after the reboot. The other way is to use a serial console. > > Roman > > Tom Gibson wrote: > >> Hi All, >> >> I'm not sure what versions of things I should be using and what versions >> others use. >> >> For my Kernel I'm using the latest Click Kernel patch with 2.6.24.7 64bit >> on >> a dual E5410 Xeon server. My main (only) issue right now is that I get >> Kernel lockups (system freezes and keyboard LEDs just blink) too often. >> It >> happens randomly when the system is idle sometimes. Also it happens >> everytime I try and transmit data too close to line rate (4x 1Gig) using >> the >> fast UDP source element. I'm thinking maybe it's a bug that's fixed in a >> newer Kernel version. I'm researching debuging this sort of thing over >> the >> serial port, so I'll probably have more details soon. >> >> For my E1000 driver I use the version that comes in the 2.6.24.7 Kernel w/ >> NAPI enabled (no click polling mode patch). I also tried compiling the >> latest stable E1000E driver (no click polling mode patches) and still got >> the Kernel freeze when transmitting too fast. I ran into issues trying to >> compile the patched E1000 driver in my Click directory. First it >> complained >> about the Makefile modifying CFLAGS, so I updated it to be more like the >> Makefile of the current E1000 driver. That fixed that problem, but it >> still >> failed to compile complaining about unknown fields in some of the main >> network struct's. >> >> I saw the latest stable Intel NIC drivers use an updated driver called >> E1000E for newer PCIe cards. Would it be a good idea to use this new >> driver >> and migrate the Click polling mode patches to it? How does NAPI support >> in >> the Intel drivers relate to Click's custom polling mode patches? >> >> I haven't worked with patches in Linux before besides applying them. I'm >> not sure how difficult it would be and what a good way would be to migrate >> the Click supplied patches to the newer versions of the Kernel and Intel >> NIC >> drivers. Does anyone have some advice on how to go about this? >> >> Thanks, >> >> Tom >> _______________________________________________ >> click mailing list >> [email protected] >> https://amsterdam.lcs.mit.edu/mailman/listinfo/click >> >> > > _______________________________________________ click mailing list [email protected] https://amsterdam.lcs.mit.edu/mailman/listinfo/click
