Usually this isn't an issue in the driver but in the kernel. Have you tried the latest stable or the latest in 3.14 (which is 3.14.29?)
Todd Fujinaka Software Application Engineer Networking Division (ND) Intel Corporation todd.fujin...@intel.com (503) 712-4565 -----Original Message----- From: Chris Caputo [mailto:ccap...@alt.net] Sent: Saturday, January 17, 2015 11:34 PM To: e1000-devel@lists.sourceforge.net Subject: [E1000-devel] kernel 3.14.28 BUG_ON in skb_segment() called by ixgbe_poll() and napi Hi. I am running linux kernel 3.14.28 with related hardware as follows: 2x Intel Xeon E5420 SuperMicro X7DBE+ Rev 2.01 Intel 5000P (Blackford) Chipset HotLava Systems Tambora 64G6 Part #6ST2830A2, PCI-e 2.0 (5GT/s), x8, 6-port, Intel 82599ES based, SFP+ 32GB RAM Got: [375129.789047] BUG: unable to handle kernel NULL pointer dereference at 0000000 [375129.790004] [<ffffffff813a16f5>] napi_gro_flush+0x65/0x80 [375129.790004] [<ffffffff813a1729>] napi_complete+0x19/0x30 [375129.790004] [<ffffffff812f9fbe>] ixgbe_poll+0x4ee/0x940 [375129.790004] [<ffffffff813a183b>] net_rx_action+0xfb/0x1a0 [375129.790004] [<ffffffff8104ec3c>] __do_softirq+0xdc/0x1f0 [375129.790004] [<ffffffff8104ef5d>] irq_exit+0x9d/0xb0 [375129.790004] [<ffffffff81003e33>] do_IRQ+0x53/0xf0 [375129.790004] [<ffffffff814fddaa>] common_interrupt+0x6a/0x6a [375129.790004] <EOI> [375129.790004] [<ffffffff81074ac8>] ? sched_clock_cpu+0x88/0xb0 [375129.790004] [<ffffffff8100a526>] ? default_idle+0x6/0x10 [375129.790004] [<ffffffff8100ac96>] arch_cpu_idle+0x16/0x20 [375129.790004] [<ffffffff810863c1>] cpu_startup_entry+0x91/0x180 [375129.790004] [<ffffffff8102c13f>] start_secondary+0x19f/0x1f0 [375129.790004] Code: 4c 24 60 eb 21 0f 1f 80 00 00 00 00 41 83 c5 01 49 83 c4 10 48 83 c1 10 41 39 c3 0f 86 7b 01 00 00 41 89 c7 89 c2 45 39 e9 7f 37 <41> 8b 46 6c 41 39 46 68 0f 85 6d 03 00 00 45 8b a6 c4 00 00 00 [375129.790004] RIP [<ffffffff8139567f>] skb_segment+0x5df/0x980 [375129.790004] RSP <ffff88082fcc3828> [375129.790004] CR2: 000000000000006c [375129.790004] ---[ end trace ce413143217a96ad ]--- [375129.790004] Kernel panic - not syncing: Fatal exception in interrupt [375129.790004] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0x [ffffffff80000000-0xffffffff9fffffff) [375129.790004] Rebooting in 10 seconds.. And then just after rebooting: [ 53.268587] BUG: unable to handle kernel NULL pointer dereference at 00000000 [ 53.269532] [<ffffffff813a1729>] napi_complete+0x19/0x30 [ 53.269532] [<ffffffff812f9fbe>] ixgbe_poll+0x4ee/0x940 [ 53.269532] [<ffffffff812032c4>] ? timerqueue_del+0x24/0x70 [ 53.269532] [<ffffffff81203230>] ? timerqueue_add+0x60/0xb0 [ 53.269532] [<ffffffff813a183b>] net_rx_action+0xfb/0x1a0 [ 53.269532] [<ffffffff8104ec3c>] __do_softirq+0xdc/0x1f0 [ 53.269532] [<ffffffff8104ef5d>] irq_exit+0x9d/0xb0 [ 53.269532] [<ffffffff81003e33>] do_IRQ+0x53/0xf0 [ 53.269532] [<ffffffff814fddaa>] common_interrupt+0x6a/0x6a [ 53.269532] <EOI> [ 53.269532] [<ffffffff8100a526>] ? default_idle+0x6/0x10 [ 53.269532] [<ffffffff8100ac96>] arch_cpu_idle+0x16/0x20 [ 53.269532] [<ffffffff810863c1>] cpu_startup_entry+0x91/0x180 [ 53.269532] [<ffffffff8102c13f>] start_secondary+0x19f/0x1f0 [ 53.269532] Code: 4c 24 60 eb 21 0f 1f 80 00 00 00 00 41 83 c5 01 49 83 c4 10 [ 48 83 c1 10 41 39 c3 0f 86 7b 01 00 00 41 89 c7 89 c2 45 39 e9 7f 37 <41> 8b 46 [ 6c 41 39 46 68 0f 85 6d 03 00 00 45 8b a6 c4 00 00 00 [ 53.269532] RIP [<ffffffff8139567f>] skb_segment+0x5df/0x980 [ 53.269532] RSP <ffff88082fd43840> [ 53.269532] CR2: 000000000000006c [ 53.269532] ---[ end trace 1c1a68627fa9d6de ]--- [ 53.269532] Kernel panic - not syncing: Fatal exception in interrupt [ 53.269532] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) [ 53.269532] Rebooting in 10 seconds.. Rebooted again and the system stayed up, but I don't know if it will happen again. The code which triggered the BUG is in skb_segment() in net/core/skbuff.c (line 3001 of kernel 3.14.28): while (pos < offset + len) { if (i >= nfrags) { >>>> BUG_ON(skb_headlen(list_skb)); i = 0; Since the call stack includes ixgbe_poll() each time, I wonder if this might be an issue with the ixgbe driver or something others have seen? Suggestions most welcome. Thanks, Chris ------------------------------------------------------------------------------ New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired ------------------------------------------------------------------------------ New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired