Re: [E1000-devel] kernel 3.14.28 BUG_ON in skb_segment() called by ixgbe_poll() and napi

Chris Caputo Mon, 19 Jan 2015 15:25:11 -0800

Hey Todd,

Am trying 3.14.29 now...


By the way, one thing I did with 3.14.28 before the crash is I ifconfig 
down'ed one of the 10G interfaces a day or two earlier.  Not sure if 
related, but pointing that out just in case useful.

Thanks,
Chris

On Mon, 19 Jan 2015, Fujinaka, Todd wrote:
> Usually this isn't an issue in the driver but in the kernel. Have you 
> tried the latest stable or the latest in 3.14 (which is 3.14.29?)
> 
> Todd Fujinaka
> Software Application Engineer
> Networking Division (ND)
> Intel Corporation
> todd.fujin...@intel.com
> (503) 712-4565
> 
> -----Original Message-----
> From: Chris Caputo [mailto:ccap...@alt.net] 
> Sent: Saturday, January 17, 2015 11:34 PM
> To: e1000-devel@lists.sourceforge.net
> Subject: [E1000-devel] kernel 3.14.28 BUG_ON in skb_segment() called by 
> ixgbe_poll() and napi
> 
> Hi.  I am running linux kernel 3.14.28 with related hardware as follows:
> 
> 2x Intel Xeon E5420
> SuperMicro X7DBE+ Rev 2.01
> Intel 5000P (Blackford) Chipset
> HotLava Systems Tambora 64G6 Part #6ST2830A2, PCI-e 2.0 (5GT/s), x8, 6-port, 
> Intel 82599ES based, SFP+ 32GB RAM
> 
> Got:
> 
> [375129.789047] BUG: unable to handle kernel NULL pointer dereference at 
> 0000000 [375129.790004]  [<ffffffff813a16f5>] napi_gro_flush+0x65/0x80 
> [375129.790004]  [<ffffffff813a1729>] napi_complete+0x19/0x30 [375129.790004] 
>  [<ffffffff812f9fbe>] ixgbe_poll+0x4ee/0x940 [375129.790004]  
> [<ffffffff813a183b>] net_rx_action+0xfb/0x1a0 [375129.790004]  
> [<ffffffff8104ec3c>] __do_softirq+0xdc/0x1f0 [375129.790004]  
> [<ffffffff8104ef5d>] irq_exit+0x9d/0xb0 [375129.790004]  [<ffffffff81003e33>] 
> do_IRQ+0x53/0xf0 [375129.790004]  [<ffffffff814fddaa>] 
> common_interrupt+0x6a/0x6a [375129.790004]  <EOI> [375129.790004]  
> [<ffffffff81074ac8>] ? sched_clock_cpu+0x88/0xb0 [375129.790004]  
> [<ffffffff8100a526>] ? default_idle+0x6/0x10 [375129.790004]  
> [<ffffffff8100ac96>] arch_cpu_idle+0x16/0x20 [375129.790004]  
> [<ffffffff810863c1>] cpu_startup_entry+0x91/0x180 [375129.790004]  
> [<ffffffff8102c13f>] start_secondary+0x19f/0x1f0 [375129.790004] Code: 4c 24 
> 60 eb 21 0f 1f 80 00 00 00 00 41 83 c5 01 49 83 c4 10
>                 48 83 c1 10 41 39 c3 0f 86 7b 01 00 00 41 89 c7 89 c2 45 39 
> e9 7f 37 <41> 8b 46
>                 6c 41 39 46 68 0f 85 6d 03 00 00 45 8b a6 c4 00 00 00 
> [375129.790004] RIP  [<ffffffff8139567f>] skb_segment+0x5df/0x980 
> [375129.790004]  RSP <ffff88082fcc3828> [375129.790004] CR2: 000000000000006c 
> [375129.790004] ---[ end trace ce413143217a96ad ]--- [375129.790004] Kernel 
> panic - not syncing: Fatal exception in interrupt [375129.790004] Kernel 
> Offset: 0x0 from 0xffffffff81000000 (relocation range: 0x 
> [ffffffff80000000-0xffffffff9fffffff)
> [375129.790004] Rebooting in 10 seconds..
> 
> And then just after rebooting:
> 
> [   53.268587] BUG: unable to handle kernel NULL pointer dereference at 
> 00000000
> [   53.269532]  [<ffffffff813a1729>] napi_complete+0x19/0x30
> [   53.269532]  [<ffffffff812f9fbe>] ixgbe_poll+0x4ee/0x940
> [   53.269532]  [<ffffffff812032c4>] ? timerqueue_del+0x24/0x70
> [   53.269532]  [<ffffffff81203230>] ? timerqueue_add+0x60/0xb0
> [   53.269532]  [<ffffffff813a183b>] net_rx_action+0xfb/0x1a0
> [   53.269532]  [<ffffffff8104ec3c>] __do_softirq+0xdc/0x1f0
> [   53.269532]  [<ffffffff8104ef5d>] irq_exit+0x9d/0xb0
> [   53.269532]  [<ffffffff81003e33>] do_IRQ+0x53/0xf0
> [   53.269532]  [<ffffffff814fddaa>] common_interrupt+0x6a/0x6a
> [   53.269532]  <EOI>
> [   53.269532]  [<ffffffff8100a526>] ? default_idle+0x6/0x10
> [   53.269532]  [<ffffffff8100ac96>] arch_cpu_idle+0x16/0x20
> [   53.269532]  [<ffffffff810863c1>] cpu_startup_entry+0x91/0x180
> [   53.269532]  [<ffffffff8102c13f>] start_secondary+0x19f/0x1f0
> [   53.269532] Code: 4c 24 60 eb 21 0f 1f 80 00 00 00 00 41 83 c5 01 49 83 c4 
> 10
> [              48 83 c1 10 41 39 c3 0f 86 7b 01 00 00 41 89 c7 89 c2 45 39 e9 
> 7f 37 <41> 8b 46
> [              6c 41 39 46 68 0f 85 6d 03 00 00 45 8b a6 c4 00 00 00
> [   53.269532] RIP  [<ffffffff8139567f>] skb_segment+0x5df/0x980
> [   53.269532]  RSP <ffff88082fd43840>
> [   53.269532] CR2: 000000000000006c
> [   53.269532] ---[ end trace 1c1a68627fa9d6de ]---
> [   53.269532] Kernel panic - not syncing: Fatal exception in interrupt
> [   53.269532] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 
> 0xffffffff80000000-0xffffffff9fffffff)
> [   53.269532] Rebooting in 10 seconds..
> 
> Rebooted again and the system stayed up, but I don't know if it will happen 
> again.
> 
> The code which triggered the BUG is in skb_segment() in net/core/skbuff.c 
> (line 3001 of kernel 3.14.28):
> 
>                 while (pos < offset + len) {
>                         if (i >= nfrags) {
> >>>>                            BUG_ON(skb_headlen(list_skb));
> 
>                                 i = 0;
> 
> Since the call stack includes ixgbe_poll() each time, I wonder if this might 
> be an issue with the ixgbe driver or something others have seen?
> 
> Suggestions most welcome.
> 
> Thanks,
> Chris
> 
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit 
> http://communities.intel.com/community/wired
> 

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] kernel 3.14.28 BUG_ON in skb_segment() called by ixgbe_poll() and napi

Reply via email to