Crash with 3.14.29 below with the same BUG in skb_segment() in 
net/core/skbuff.c.

What should I try next?

Thanks,
Chris

[ 4010.835995] BUG: unable to handle kernel NULL pointer dereference at 
000000000000006c 
[ 4010.836048] IP: [<ffffffff813955df> skb_segment+0x5df/0x980
[ 4010.836075] PGD 7f8296067 PUD 7f8298067 PMD 0
[ 4010.836130] Oops: 0000 [#1] SMP
[ 4010.836158] Modules linked in: w83627hf_wdt ip_vs_wlc ip_vs_wlib ip_vs 
libcrc32 nf_conntrack bonding e1000 e1000e 
[ 4010.836250] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.29 
[ 4010.836261] Hardware name: Supermicro X7DB8/X7DB8, BIOS 2.1 06/23/2008
[ 4010.836301] task: ffffffff81810460 ti: ffffffff81800000 task.ti: 
ffffffff81800000
[ 4010.836346] RIP: 0010:[<ffffffff813955df>]  [<ffffffff813955df>] 
skb_segment+0x5df/0x980 
[ 4010.836407] RSP: 0018:ffff88082fc03730  EFLAGS: 00010246                     
[ 4010.836503] RAX: 0000000000000a95 RBX: ffff88080b1ddb00 RCX: ffff8805e2edff10
[ 4010.836591] RDX: 0000000000000a95 RSI: 00000000000004d1 RDI: ffffea00032c6480
[ 4010.836680] RBP: ffff88082fc03800 R08: 0000000000010496 R09: 0000000000000002
[ 4010.836769] R10: ffff88080b1dcd00 R11: 0000000000010a12 R12: ffff8808073c9810
[ 4010.836842] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000a95
[ 4010.836842] FS:  0000000000000000(0000) GS:ffff88082fc00000(0000) 
knlGS:0000000000000000 
[ 4010.836842] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4010.836842] CR2: 000000000000006c CR3: 00000000c9fc8000 CR4: 0000000 
0000007f0
[ 4010.836842] Stack:
[ 4010.836842]  ffffffff813a2f0b ffff88082fc03758 0000000000010496 
fffffffffffefb6a
[ 4010.836842]  0000000000010a12 0000000000000066 ffff88080b1dcd00 
0000000100001ee0 
[ 4010.836842]  ffffffffffffffda 00000000000104bc 000000260000057c 
ffff88080b1ddb00 
[ 4010.836842] Call Trace:
[ 4010.836842]  <IRQ>
[ 4010.836842]  [<ffffffff813a2f0b>] ? dev_queue_xmit+0xb/0x10
[ 4010.836842]  [<ffffffff8143c91d>] tcp_gso_segment+0x10d/0x3f0
[ 4010.836842]  [<ffffffff814ccf42>] ipv6_gso_segment+0x102/0x2c0
[ 4010.836842]  [<ffffffff813a22e3>] skb_mac_gso_segment+0x93/0x170
[ 4010.836842]  [<ffffffff8145adaf>] gre_gso_segment+0x12f/0x360
[ 4010.836842]  [<ffffffff8144c38d>] inet_gso_segment+0x12d/0x360
[ 4010.836842]  [<ffffffff813a22e3>] skb_mac_gso_segment+0x93/0x170
[ 4010.836842]  [<ffffffff813a241b>] __skb_gso_segment+0x5b/H0xc0
[ 4010.836842]  [<ffffffff813a273d>] dev_hard_start_xmit+0x17d/0x4d0
[ 4010.836842]  [<ffffffff813be290>] sch_direct_xmit+0xe0/0x1c0
[ 4010.836842]  [<ffffffff813be3f9>] __qdisc_run+0x89/0x150
[ 4010.836842]  [<ffffffff813a2d12>] __dev_queue_xmit+0x282/0x470
[ 4010.836842]  [<ffffffff813a2f0b>] dev_queue_xmit+0xb/0x10
[ 4010.836842]  [<ffffffff813aa832>] neigh_connected_output+0xb2/0xf0
[ 4010.836842]  [<ffffffff81419778>] ip_finish_output+0x1c8/0x400
[ 4010.836842]  [<ffffffff8141acd8>] ip_output+0x88/0x90
[ 4010.836842]  [<ffffffff81416cb6>] ip_forward_finish+0x86/0x1c0
[ 4010.836842]  [<ffffffff81417163>] ip_forward+0x373/0x440
[ 4010.836842]  [<ffffffff81414ea8>] ip_rcv_finish+0x78/0x340
[ 4010.836842]  [<ffffffff814157dc>] ip_rcv+0x2cc/0x3e0
[ 4010.836842]  [<ffffffff813a120e>] __netif_receive_skb_core+0x5be/0x7d0
[ 4010.836842]  [<ffffffff814cd162>] ? tcp6_gro_complete+0x62/0x70
[ 4010.836842]  [<ffffffff813a1438>] __netif_receive_skb+0x18/0x60
[ 4010.836842]  [<ffffffff813a14a8>] netif_receive_skb_internal+0x28/0x90
[ 4010.836842]  [<ffffffff813a15bc>] napi_gro_complete+0x9c/0xd0
[ 4010.836842]  [<ffffffff813a1ad6>] dev_gro_receive+0x296/0x440
[ 4010.836842]  [<ffffffff813a1d7d>] napi_gro_receive+0xd/0x80
[ 4010.836842]  [<ffffffff812f8c1c>] ixgbe_clean_rx_irq+0x62c/0x9e0
[ 4010.836842]  [<ffffffff812f9ec3>] ixgbe_poll+0x493/0x940
[ 4010.836842]  [<ffffffff8107fb8f>] ? __wake_up+0x3f/0x50
[ 4010.836842]  [<ffffffff813a179b>] net_rx_action+0xfb/0x1a0
[ 4010.836842]  [<ffffffff8104ec3c>] __do_softirq+0xdc/0x
[ 4010.836842]  [<ffffffff8104ef5d>] irq_exit+0x9d/0xb0
[ 4010.836842]  [<ffffffff81003e33>] do_IRQ+0x53/0xf0
[ 4010.836842]  [<ffffffff814fdd2a>] common_interrupt+0x6a/0x6a
[ 4010.836842]  <EOI>
[ 4010.836842]  [<ffffffff8100a526>] ? default_idle+0x6/0x10
[ 4010.836842]  [<ffffffff8100ac96>] arch_cpu_idle+0x16/0x20
[ 4010.836842]  [<ffffffff810863a1>] cpu_startup_entry+0x91/0x180
[ 4010.836842]  [<ffffffff814f1202>] rest_init+0x72/0x80
[ 4010.836842]  [<ffffffff81892da6>] start_kernel+0x340/0x34b
[ 4010.836842]  [<ffffffff8189286f>] ? repair_env_string+0x5c/0x5c
[ 4010.836842]  [<ffffffff818925ad>] x86_64_start_reservations+0x2a/0x2c
[ 4010.836842]  [<ffffffff81892676>] x86_64_start_kernel+0xc7/0xca
[ 4010.836842] Code: 4c 24 60 eb 21 0f 1f 80 00 00 00 00 41 83 c5 01 49 83 c4 10
[ 4010.836842] 48 83 c1 10 41 39 c3 0f 86 7b 01 00 00 41 89 c7 89 c2 45 39 e9 
7f 37 <41> 8b 46
[ 4010.836842] 6c 41 39 46 68 0f 85 6d 03 00 00 45 8b a6 c4 00 00 00
[ 4010.836842] RIP  [<ffffffff813955df>] skb_segment+0x5df/0x980
[ 4010.836842]  RSP <ffff88082fc03730>
[ 4010.836842] CR2: 000000000000006c
[ 4010.836842] ---[ end trace ad63244a1b43b393 ]---
[ 4010.836842] Kernel panic - not syncing: Fatal exception in interrupt
[ 4010.836842] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 
0xffffffff80000000-0xffffffff9fffffff)
[ 4010.836842] Rebooting in 10 seconds..

On Mon, 19 Jan 2015, Chris Caputo wrote:
> Hey Todd,
> 
> Am trying 3.14.29 now...
> 
> By the way, one thing I did with 3.14.28 before the crash is I ifconfig 
> down'ed one of the 10G interfaces a day or two earlier.  Not sure if 
> related, but pointing that out just in case useful.
> 
> Thanks,
> Chris
> 
> On Mon, 19 Jan 2015, Fujinaka, Todd wrote:
> > Usually this isn't an issue in the driver but in the kernel. Have you 
> > tried the latest stable or the latest in 3.14 (which is 3.14.29?)
> > 
> > Todd Fujinaka
> > Software Application Engineer
> > Networking Division (ND)
> > Intel Corporation
> > todd.fujin...@intel.com
> > (503) 712-4565
> > 
> > -----Original Message-----
> > From: Chris Caputo [mailto:ccap...@alt.net] 
> > Sent: Saturday, January 17, 2015 11:34 PM
> > To: e1000-devel@lists.sourceforge.net
> > Subject: [E1000-devel] kernel 3.14.28 BUG_ON in skb_segment() called by 
> > ixgbe_poll() and napi
> > 
> > Hi.  I am running linux kernel 3.14.28 with related hardware as follows:
> > 
> > 2x Intel Xeon E5420
> > SuperMicro X7DBE+ Rev 2.01
> > Intel 5000P (Blackford) Chipset
> > HotLava Systems Tambora 64G6 Part #6ST2830A2, PCI-e 2.0 (5GT/s), x8, 
> > 6-port, Intel 82599ES based, SFP+ 32GB RAM
> > 
> > Got:
> > 
> > [375129.789047] BUG: unable to handle kernel NULL pointer dereference at 
> > 0000000 [375129.790004]  [<ffffffff813a16f5>] napi_gro_flush+0x65/0x80 
> > [375129.790004]  [<ffffffff813a1729>] napi_complete+0x19/0x30 
> > [375129.790004]  [<ffffffff812f9fbe>] ixgbe_poll+0x4ee/0x940 
> > [375129.790004]  [<ffffffff813a183b>] net_rx_action+0xfb/0x1a0 
> > [375129.790004]  [<ffffffff8104ec3c>] __do_softirq+0xdc/0x1f0 
> > [375129.790004]  [<ffffffff8104ef5d>] irq_exit+0x9d/0xb0 [375129.790004]  
> > [<ffffffff81003e33>] do_IRQ+0x53/0xf0 [375129.790004]  [<ffffffff814fddaa>] 
> > common_interrupt+0x6a/0x6a [375129.790004]  <EOI> [375129.790004]  
> > [<ffffffff81074ac8>] ? sched_clock_cpu+0x88/0xb0 [375129.790004]  
> > [<ffffffff8100a526>] ? default_idle+0x6/0x10 [375129.790004]  
> > [<ffffffff8100ac96>] arch_cpu_idle+0x16/0x20 [375129.790004]  
> > [<ffffffff810863c1>] cpu_startup_entry+0x91/0x180 [375129.790004]  
> > [<ffffffff8102c13f>] start_secondary+0x19f/0x1f0 [375129.790004] Code: 4c 
> > 24 60 eb 21 0f 1f 80 00 00 00 00 41 83 c5 01 49 83 c4 10
> >                 48 83 c1 10 41 39 c3 0f 86 7b 01 00 00 41 89 c7 89 c2 45 39 
> > e9 7f 37 <41> 8b 46
> >                 6c 41 39 46 68 0f 85 6d 03 00 00 45 8b a6 c4 00 00 00 
> > [375129.790004] RIP  [<ffffffff8139567f>] skb_segment+0x5df/0x980 
> > [375129.790004]  RSP <ffff88082fcc3828> [375129.790004] CR2: 
> > 000000000000006c [375129.790004] ---[ end trace ce413143217a96ad ]--- 
> > [375129.790004] Kernel panic - not syncing: Fatal exception in interrupt 
> > [375129.790004] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation 
> > range: 0x [ffffffff80000000-0xffffffff9fffffff)
> > [375129.790004] Rebooting in 10 seconds..
> > 
> > And then just after rebooting:
> > 
> > [   53.268587] BUG: unable to handle kernel NULL pointer dereference at 
> > 00000000
> > [   53.269532]  [<ffffffff813a1729>] napi_complete+0x19/0x30
> > [   53.269532]  [<ffffffff812f9fbe>] ixgbe_poll+0x4ee/0x940
> > [   53.269532]  [<ffffffff812032c4>] ? timerqueue_del+0x24/0x70
> > [   53.269532]  [<ffffffff81203230>] ? timerqueue_add+0x60/0xb0
> > [   53.269532]  [<ffffffff813a183b>] net_rx_action+0xfb/0x1a0
> > [   53.269532]  [<ffffffff8104ec3c>] __do_softirq+0xdc/0x1f0
> > [   53.269532]  [<ffffffff8104ef5d>] irq_exit+0x9d/0xb0
> > [   53.269532]  [<ffffffff81003e33>] do_IRQ+0x53/0xf0
> > [   53.269532]  [<ffffffff814fddaa>] common_interrupt+0x6a/0x6a
> > [   53.269532]  <EOI>
> > [   53.269532]  [<ffffffff8100a526>] ? default_idle+0x6/0x10
> > [   53.269532]  [<ffffffff8100ac96>] arch_cpu_idle+0x16/0x20
> > [   53.269532]  [<ffffffff810863c1>] cpu_startup_entry+0x91/0x180
> > [   53.269532]  [<ffffffff8102c13f>] start_secondary+0x19f/0x1f0
> > [   53.269532] Code: 4c 24 60 eb 21 0f 1f 80 00 00 00 00 41 83 c5 01 49 83 
> > c4 10
> > [              48 83 c1 10 41 39 c3 0f 86 7b 01 00 00 41 89 c7 89 c2 45 39 
> > e9 7f 37 <41> 8b 46
> > [              6c 41 39 46 68 0f 85 6d 03 00 00 45 8b a6 c4 00 00 00
> > [   53.269532] RIP  [<ffffffff8139567f>] skb_segment+0x5df/0x980
> > [   53.269532]  RSP <ffff88082fd43840>
> > [   53.269532] CR2: 000000000000006c
> > [   53.269532] ---[ end trace 1c1a68627fa9d6de ]---
> > [   53.269532] Kernel panic - not syncing: Fatal exception in interrupt
> > [   53.269532] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation 
> > range: 0xffffffff80000000-0xffffffff9fffffff)
> > [   53.269532] Rebooting in 10 seconds..
> > 
> > Rebooted again and the system stayed up, but I don't know if it will happen 
> > again.
> > 
> > The code which triggered the BUG is in skb_segment() in net/core/skbuff.c 
> > (line 3001 of kernel 3.14.28):
> > 
> >                 while (pos < offset + len) {
> >                         if (i >= nfrags) {
> > >>>>                            BUG_ON(skb_headlen(list_skb));
> > 
> >                                 i = 0;
> > 
> > Since the call stack includes ixgbe_poll() each time, I wonder if this 
> > might be an issue with the ixgbe driver or something others have seen?
> > 
> > Suggestions most welcome.
> > 
> > Thanks,
> > Chris
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to