Hi, The kernel oops is happening when rx_poll() is returning NULL with got > 1 when it encounters error. It's bug and I could fix it easily.
However I've realized that 82571 and greater support packet split and e1000e driver is configuring registers to use that functionality. As a result with this e1000e driver, 82571 and greater will split packets when mtu > 1500. Unfortunately, I missed to implement it for polling patch so e1000_rx_poll() is watching incorrect receive descriptor when mtu > 1500. We need a function something like e1000_rx_poll_ps() additionally. It shouldn't hard but I'm kind of busy so I don't know when I'm going to be free. I'll let you know when it's done so you can test. Joonwoo On Mon, Jan 18, 2010 at 2:41 AM, Nuutti Varis <[email protected]> wrote: > On Jan 13, 2010, at 6:33 PM, Joonwoo Park wrote: > >> Hi Nuutti, >> >> On Wed, Jan 13, 2010 at 3:34 AM, Nuutti Varis <[email protected]> wrote: >>> Hello, >>> >>> The e1000e polling patch for the Intel NICs (e1000e driver) seems to have >>> issues with MTU > 1500 (we use a MTU of 1540, issues start from 1501->). >>> From what I could gather with very brief experience in kernel hacking, the >>> driver in polling mode gets the E1000_RXD_STAT_EOP bit set when MTU>1500, >>> and e1000_rx_poll promptly goes through all the buffers in the ring with >>> the code on lines 5173-5179 in netdev.c. >> >> That process looks correct. >> When you say MTU size, do you mean MTU size of NIC? >> What's the packet size when you have this problem? > > MTU size of the NIC. The oops itself happens before any packets are received, > i.e. the kernel oopses when I do click-install foo.click. > >>> End result is an oops (null pointer dereference) in PollDevice::run_task, >>> as e1000_rx_poll increments "got" before assigning the skb to skb_head. >>> >>> System specs follows: >>> - Click is the latest (at the time of writing) from GIT >>> - Linux 100g-10-x86-64 2.6.24.7-click #1 SMP Tue Jan 12 15:10:28 EET 2010 >>> x86_64 GNU/Linux >>> - PREEMPT_NONE=y, {PREEMPT_VOLUNTARY, PREEMPT, PREEMPT_BKL}=n >>> - Click configured with --enable-etherswitch --enable-linuxmodule >>> - gcc version 4.3.2 (Debian 4.3.2-1.1) >>> - Network Interface Card chips from lspci (we have 4x ports internally and >>> another quad port card) >>> * Intel Corporation 82574L Gigabit Network Connection >>> * Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) >>> >> >> Thanks for your very detail explanation, but can you also give me your >> click config (or simplified config that I can reproduce this would be >> great) > > Simplest configuration with PollDevice I could figure out oopses: > > PollDevice( eth0 ) -> Discard; > Idle -> ToDevice( eth0 ); > >>> == >>> [ 1360.185658] Unable to handle kernel NULL pointer dereference at >>> 00000000000000c0 RIP: >>> [ 1360.185705] [<ffffffff88195197>] >>> :click:_ZN10PollDevice8run_taskEP4Task+0x127/0x360 >>> [ 1360.186038] PGD 0 >>> [ 1360.186171] Oops: 0002 [1] SMP >>> [ 1360.186287] CPU 7 >>> [ 1360.186416] Modules linked in: dot1q(PF) trill(PF) click proclikefs loop >>> button evdev e1000e pcspkr ext3 jbd mbcache sd_mod ahci libata scsi_mod >>> ehci_hcd uhci_hcd thermal processor fan >>> [ 1360.187224] Pid: 22129, comm: kclick Tainted: PF 2.6.24.7-click #1 >>> [ 1360.187282] RIP: 0010:[<ffffffff88195197>] [<ffffffff88195197>] >>> :click:_ZN10PollDevice8run_taskEP4Task+0x127/0x360 >>> [ 1360.187531] RSP: 0018:ffff8101bc015e70 EFLAGS: 00010246 >>> [ 1360.187587] RAX: 0000000000000008 RBX: 0000000000000000 RCX: >>> ffff8101bcb6a780 >>> [ 1360.187645] RDX: 0000000000000006 RSI: ffff8101bc959040 RDI: >>> 0000000000000000 >>> [ 1360.187704] RBP: 0000000000000000 R08: ffff8101bfc026e0 R09: >>> 0000000000000086 >>> [ 1360.187763] R10: 00000000000a06aa R11: ffff8101bc015dd0 R12: >>> ffff8101bc197000 >>> [ 1360.187822] R13: 0000000000000000 R14: ffff8101bbc366c0 R15: >>> ffff8101bbc36744 >>> [ 1360.187880] FS: 00002adc4ad5e6e0(0000) GS:ffff8101bf1b87c0(0000) >>> knlGS:0000000000000000 >>> [ 1360.187955] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>> [ 1360.188011] CR2: 00000000000000c0 CR3: 0000000000201000 CR4: >>> 00000000000006e0 >>> [ 1360.188070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>> 0000000000000000 >>> [ 1360.188128] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >>> 0000000000000400 >>> [ 1360.188187] Process kclick (pid: 22129, threadinfo ffff8101bc014000, >>> task ffff8101be8172e0) >>> [ 1360.188261] Stack: 0000000000000000 ffffffff88197bc2 0000000000000000 >>> 0000000800000008 >>> [ 1360.188496] 0000000000000000 ffff8101bc1970b0 0000000000000000 >>> 000000000000007d >>> [ 1360.188697] 0000000000000000 ffff8101bbc366c0 ffff8101bbc36744 >>> ffffffff8813caf5 >>> [ 1360.188901] Call Trace: >>> [ 1360.189138] [<ffffffff88197bc2>] >>> :click:_ZN8ToDevice8run_taskEP4Task+0x112/0x580 >>> [ 1360.189338] [<ffffffff8813caf5>] >>> :click:_ZN12RouterThread6driverEv+0x335/0x4c0 >>> [ 1360.189555] [<ffffffff881db8ad>] :click:_ZL11click_schedPv+0xcd/0x1d0 >>> [ 1360.189648] [<ffffffff8020cd88>] child_rip+0xa/0x12 >>> [ 1360.189840] [<ffffffff881db7e0>] :click:_ZL11click_schedPv+0x0/0x1d0 >>> [ 1360.189906] [<ffffffff8020cd7e>] child_rip+0x0/0x12 >>> [ 1360.189961] >>> [ 1360.190009] >>> [ 1360.190009] Code: 48 83 ab c0 00 00 00 0e 83 43 68 0e 48 8b 83 b8 00 00 >>> 00 48 >>> [ 1360.190911] RIP [<ffffffff88195197>] >>> :click:_ZN10PollDevice8run_taskEP4Task+0x127/0x360 >>> [ 1360.191155] RSP <ffff8101bc015e70> >>> [ 1360.191207] CR2: 00000000000000c0 >>> [ 1360.191299] ---[ end trace f8e2fe527d7ef925 ]--- >> >> Well. I suggest you to find where the oops is happening from click source. >> To get that you can >> - recompile & install kernel with CONFIG_DEBUG_INFO >> - recompile & click again. >> - reproduce this issue. >> - run gdb click.ko >> - type command 'info line *{eip}' to get oopsing source code/line >> eg) info line *_ZN10PollDevice8run_taskEP4Task+0x127 > > (gdb) info line *_ZN10PollDevice8run_taskEP4Task+0x11b > Line 945 of "/lib/modules/2.6.24.7-click/build/include/linux/skbuff.h" > starts at address 0x8722b <_ZN10PollDevice8run_taskEP4Task+283> > and ends at 0x87233 <_ZN10PollDevice8run_taskEP4Task+291> > > That is skb_push(), which is probably the skb_push() at line 269 in > polldevice.cc? > > Br, Nuutti > > _______________________________________________ click mailing list [email protected] https://amsterdam.lcs.mit.edu/mailman/listinfo/click
