Usually this isn't an issue in the driver but in the kernel. Have you tried the 
latest stable or the latest in 3.14 (which is 3.14.29?)

Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565

-----Original Message-----
From: Chris Caputo [mailto:ccap...@alt.net] 
Sent: Saturday, January 17, 2015 11:34 PM
To: e1000-devel@lists.sourceforge.net
Subject: [E1000-devel] kernel 3.14.28 BUG_ON in skb_segment() called by 
ixgbe_poll() and napi

Hi.  I am running linux kernel 3.14.28 with related hardware as follows:

2x Intel Xeon E5420
SuperMicro X7DBE+ Rev 2.01
Intel 5000P (Blackford) Chipset
HotLava Systems Tambora 64G6 Part #6ST2830A2, PCI-e 2.0 (5GT/s), x8, 6-port, 
Intel 82599ES based, SFP+ 32GB RAM

Got:

[375129.789047] BUG: unable to handle kernel NULL pointer dereference at 
0000000 [375129.790004]  [<ffffffff813a16f5>] napi_gro_flush+0x65/0x80 
[375129.790004]  [<ffffffff813a1729>] napi_complete+0x19/0x30 [375129.790004]  
[<ffffffff812f9fbe>] ixgbe_poll+0x4ee/0x940 [375129.790004]  
[<ffffffff813a183b>] net_rx_action+0xfb/0x1a0 [375129.790004]  
[<ffffffff8104ec3c>] __do_softirq+0xdc/0x1f0 [375129.790004]  
[<ffffffff8104ef5d>] irq_exit+0x9d/0xb0 [375129.790004]  [<ffffffff81003e33>] 
do_IRQ+0x53/0xf0 [375129.790004]  [<ffffffff814fddaa>] 
common_interrupt+0x6a/0x6a [375129.790004]  <EOI> [375129.790004]  
[<ffffffff81074ac8>] ? sched_clock_cpu+0x88/0xb0 [375129.790004]  
[<ffffffff8100a526>] ? default_idle+0x6/0x10 [375129.790004]  
[<ffffffff8100ac96>] arch_cpu_idle+0x16/0x20 [375129.790004]  
[<ffffffff810863c1>] cpu_startup_entry+0x91/0x180 [375129.790004]  
[<ffffffff8102c13f>] start_secondary+0x19f/0x1f0 [375129.790004] Code: 4c 24 60 
eb 21 0f 1f 80 00 00 00 00 41 83 c5 01 49 83 c4 10
                48 83 c1 10 41 39 c3 0f 86 7b 01 00 00 41 89 c7 89 c2 45 39 e9 
7f 37 <41> 8b 46
                6c 41 39 46 68 0f 85 6d 03 00 00 45 8b a6 c4 00 00 00 
[375129.790004] RIP  [<ffffffff8139567f>] skb_segment+0x5df/0x980 
[375129.790004]  RSP <ffff88082fcc3828> [375129.790004] CR2: 000000000000006c 
[375129.790004] ---[ end trace ce413143217a96ad ]--- [375129.790004] Kernel 
panic - not syncing: Fatal exception in interrupt [375129.790004] Kernel 
Offset: 0x0 from 0xffffffff81000000 (relocation range: 0x 
[ffffffff80000000-0xffffffff9fffffff)
[375129.790004] Rebooting in 10 seconds..

And then just after rebooting:

[   53.268587] BUG: unable to handle kernel NULL pointer dereference at 00000000
[   53.269532]  [<ffffffff813a1729>] napi_complete+0x19/0x30
[   53.269532]  [<ffffffff812f9fbe>] ixgbe_poll+0x4ee/0x940
[   53.269532]  [<ffffffff812032c4>] ? timerqueue_del+0x24/0x70
[   53.269532]  [<ffffffff81203230>] ? timerqueue_add+0x60/0xb0
[   53.269532]  [<ffffffff813a183b>] net_rx_action+0xfb/0x1a0
[   53.269532]  [<ffffffff8104ec3c>] __do_softirq+0xdc/0x1f0
[   53.269532]  [<ffffffff8104ef5d>] irq_exit+0x9d/0xb0
[   53.269532]  [<ffffffff81003e33>] do_IRQ+0x53/0xf0
[   53.269532]  [<ffffffff814fddaa>] common_interrupt+0x6a/0x6a
[   53.269532]  <EOI>
[   53.269532]  [<ffffffff8100a526>] ? default_idle+0x6/0x10
[   53.269532]  [<ffffffff8100ac96>] arch_cpu_idle+0x16/0x20
[   53.269532]  [<ffffffff810863c1>] cpu_startup_entry+0x91/0x180
[   53.269532]  [<ffffffff8102c13f>] start_secondary+0x19f/0x1f0
[   53.269532] Code: 4c 24 60 eb 21 0f 1f 80 00 00 00 00 41 83 c5 01 49 83 c4 10
[              48 83 c1 10 41 39 c3 0f 86 7b 01 00 00 41 89 c7 89 c2 45 39 e9 
7f 37 <41> 8b 46
[              6c 41 39 46 68 0f 85 6d 03 00 00 45 8b a6 c4 00 00 00
[   53.269532] RIP  [<ffffffff8139567f>] skb_segment+0x5df/0x980
[   53.269532]  RSP <ffff88082fd43840>
[   53.269532] CR2: 000000000000006c
[   53.269532] ---[ end trace 1c1a68627fa9d6de ]---
[   53.269532] Kernel panic - not syncing: Fatal exception in interrupt
[   53.269532] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 
0xffffffff80000000-0xffffffff9fffffff)
[   53.269532] Rebooting in 10 seconds..

Rebooted again and the system stayed up, but I don't know if it will happen 
again.

The code which triggered the BUG is in skb_segment() in net/core/skbuff.c (line 
3001 of kernel 3.14.28):

                while (pos < offset + len) {
                        if (i >= nfrags) {
>>>>                            BUG_ON(skb_headlen(list_skb));

                                i = 0;

Since the call stack includes ixgbe_poll() each time, I wonder if this might be 
an issue with the ixgbe driver or something others have seen?

Suggestions most welcome.

Thanks,
Chris

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to