[E1000-devel] kernel 3.14.28 BUG_ON in skb_segment() called by ixgbe_poll() and napi

Chris Caputo Sun, 18 Jan 2015 00:03:07 -0800

Hi.  I am running linux kernel 3.14.28 with related hardware as follows:

2x Intel Xeon E5420
SuperMicro X7DBE+ Rev 2.01
Intel 5000P (Blackford) Chipset
HotLava Systems Tambora 64G6 Part #6ST2830A2, PCI-e 2.0 (5GT/s), x8, 6-port, 
Intel 82599ES based, SFP+
32GB RAM


Got:

[375129.789047] BUG: unable to handle kernel NULL pointer dereference at 0000000
[375129.790004]  [<ffffffff813a16f5>] napi_gro_flush+0x65/0x80
[375129.790004]  [<ffffffff813a1729>] napi_complete+0x19/0x30
[375129.790004]  [<ffffffff812f9fbe>] ixgbe_poll+0x4ee/0x940
[375129.790004]  [<ffffffff813a183b>] net_rx_action+0xfb/0x1a0
[375129.790004]  [<ffffffff8104ec3c>] __do_softirq+0xdc/0x1f0
[375129.790004]  [<ffffffff8104ef5d>] irq_exit+0x9d/0xb0
[375129.790004]  [<ffffffff81003e33>] do_IRQ+0x53/0xf0
[375129.790004]  [<ffffffff814fddaa>] common_interrupt+0x6a/0x6a
[375129.790004]  <EOI>
[375129.790004]  [<ffffffff81074ac8>] ? sched_clock_cpu+0x88/0xb0
[375129.790004]  [<ffffffff8100a526>] ? default_idle+0x6/0x10
[375129.790004]  [<ffffffff8100ac96>] arch_cpu_idle+0x16/0x20
[375129.790004]  [<ffffffff810863c1>] cpu_startup_entry+0x91/0x180
[375129.790004]  [<ffffffff8102c13f>] start_secondary+0x19f/0x1f0
[375129.790004] Code: 4c 24 60 eb 21 0f 1f 80 00 00 00 00 41 83 c5 01 49 83 c4 
10
                48 83 c1 10 41 39 c3 0f 86 7b 01 00 00 41 89 c7 89 c2 45 39 e9 
7f 37 <41> 8b 46
                6c 41 39 46 68 0f 85 6d 03 00 00 45 8b a6 c4 00 00 00
[375129.790004] RIP  [<ffffffff8139567f>] skb_segment+0x5df/0x980
[375129.790004]  RSP <ffff88082fcc3828>
[375129.790004] CR2: 000000000000006c
[375129.790004] ---[ end trace ce413143217a96ad ]---
[375129.790004] Kernel panic - not syncing: Fatal exception in interrupt
[375129.790004] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 
0x [ffffffff80000000-0xffffffff9fffffff)
[375129.790004] Rebooting in 10 seconds..

And then just after rebooting:

[   53.268587] BUG: unable to handle kernel NULL pointer dereference at 00000000
[   53.269532]  [<ffffffff813a1729>] napi_complete+0x19/0x30
[   53.269532]  [<ffffffff812f9fbe>] ixgbe_poll+0x4ee/0x940
[   53.269532]  [<ffffffff812032c4>] ? timerqueue_del+0x24/0x70
[   53.269532]  [<ffffffff81203230>] ? timerqueue_add+0x60/0xb0
[   53.269532]  [<ffffffff813a183b>] net_rx_action+0xfb/0x1a0
[   53.269532]  [<ffffffff8104ec3c>] __do_softirq+0xdc/0x1f0
[   53.269532]  [<ffffffff8104ef5d>] irq_exit+0x9d/0xb0
[   53.269532]  [<ffffffff81003e33>] do_IRQ+0x53/0xf0
[   53.269532]  [<ffffffff814fddaa>] common_interrupt+0x6a/0x6a
[   53.269532]  <EOI>
[   53.269532]  [<ffffffff8100a526>] ? default_idle+0x6/0x10
[   53.269532]  [<ffffffff8100ac96>] arch_cpu_idle+0x16/0x20
[   53.269532]  [<ffffffff810863c1>] cpu_startup_entry+0x91/0x180
[   53.269532]  [<ffffffff8102c13f>] start_secondary+0x19f/0x1f0
[   53.269532] Code: 4c 24 60 eb 21 0f 1f 80 00 00 00 00 41 83 c5 01 49 83 c4 10
[              48 83 c1 10 41 39 c3 0f 86 7b 01 00 00 41 89 c7 89 c2 45 39 e9 
7f 37 <41> 8b 46
[              6c 41 39 46 68 0f 85 6d 03 00 00 45 8b a6 c4 00 00 00
[   53.269532] RIP  [<ffffffff8139567f>] skb_segment+0x5df/0x980
[   53.269532]  RSP <ffff88082fd43840>
[   53.269532] CR2: 000000000000006c
[   53.269532] ---[ end trace 1c1a68627fa9d6de ]---
[   53.269532] Kernel panic - not syncing: Fatal exception in interrupt
[   53.269532] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 
0xffffffff80000000-0xffffffff9fffffff)
[   53.269532] Rebooting in 10 seconds..

Rebooted again and the system stayed up, but I don't know if it will 
happen again.

The code which triggered the BUG is in skb_segment() in net/core/skbuff.c 
(line 3001 of kernel 3.14.28):

                while (pos < offset + len) {
                        if (i >= nfrags) {
>>>>                            BUG_ON(skb_headlen(list_skb));

                                i = 0;

Since the call stack includes ixgbe_poll() each time, I wonder if this 
might be an issue with the ixgbe driver or something others have seen?

Suggestions most welcome.

Thanks,
Chris

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

[E1000-devel] kernel 3.14.28 BUG_ON in skb_segment() called by ixgbe_poll() and napi

Reply via email to