Using the 3.7.21 version of the ixgbe driver we can reliably 
produce a crash with this signature: 

BUG: unable to handle kernel NULL pointer dereference at 000000000000006c
IP: [<ffffffffa005afef>] ixgbe_poll+0x9df/0x1710 [ixgbe]
PGD 814c7b067 PUD 8074dd067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtual/bypass/8-9/ping_watchdog
CPU 2 
Pid: 18925, comm: sport Tainted: P           ----------------   2.6.32-perf #1 
To Be Filled By O.E.M.
RIP: 0010:[<ffffffffa005afef>]  [<ffffffffa005afef>] ixgbe_poll+0x9df/0x1710 
[ixgbe]
RSP: 0018:ffff88080750b8b0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88040f816f00 RCX: 0000000000000000
RDX: 0000000000000020 RSI: ffffc9000429c000 RDI: ffff88040f891d80
RBP: ffff88080750b970 R08: 0000000000000100 R09: 0000000000000000
R10: 0000000000000100 R11: ffff88080750bfd8 R12: 0000000000000000
R13: ffffc900041221b8 R14: ffff8804077580b0 R15: 000000000000000b
FS:  00007f61ccda9700(0000) GS:ffff880028280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000006c CR3: 0000000814436000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sport (pid: 18925, threadinfo ffff88080750a000, task ffff880814beeb60)
Stack:
 0000000000000000  ffff8804148b0540 000000000000000e ffff88040703d1c0
<0> ffff8804148b0598 ffff88080750b918 ffffffff815671ac 00000001000359c2
<0> ffff880410744700 0000000000000040 ffff88040f891d80 0000004011087c9c
Call Trace:
 [<ffffffff815671ac>] ? ip_finish_output+0x13c/0x310
 [<ffffffff8152b468>] net_rx_action+0xb8/0x400
 [<ffffffff81517a84>] ? sock_def_readable+0x44/0x80
 [<ffffffff81066a91>] __do_softirq+0xc1/0x1d0
 [<ffffffff8100c1ec>] call_softirq+0x1c/0x30
 [<ffffffff8100de25>] do_softirq+0x65/0xa0
 [<ffffffff8106699a>] local_bh_enable+0x9a/0xb0
 [<ffffffff815176fc>] lock_sock_nested+0xac/0xc0
 [<ffffffff81641f0b>] ? _spin_unlock_bh+0x1b/0x20
 [<ffffffff81517627>] ? release_sock+0xd7/0x100
 [<ffffffff81571838>] tcp_recvmsg+0x38/0xe80
 [<ffffffff812d4c19>] ? cpumask_next_and+0x29/0x50
 [<ffffffff8104b6f4>] ? find_busiest_group+0x244/0xb10
 [<ffffffff810544d2>] ? default_wake_function+0x12/0x20
 [<ffffffff81516cf9>] sock_common_recvmsg+0x39/0x50
 [<ffffffff81516829>] sock_aio_read+0x159/0x160
 [<ffffffff8104dbd3>] ? perf_event_task_sched_out+0x33/0x80
 [<ffffffff810097ac>] ? __switch_to+0x1ac/0x320
 [<ffffffff815166d0>] ? sock_aio_read+0x0/0x160
 [<ffffffff811533bb>] do_sync_readv_writev+0xfb/0x140
 [<ffffffff810853b0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811543df>] do_readv_writev+0xcf/0x1f0
 [<ffffffff8156dc0d>] ? do_tcp_getsockopt+0x3d/0x5f0
 [<ffffffff81012879>] ? read_tsc+0x9/0x20
 [<ffffffff8108fc13>] ? ktime_get+0x63/0xe0
 [<ffffffff810650c2>] ? ns_to_timeval+0x12/0x40
 [<ffffffff810896af>] ? hrtimer_get_remaining+0x3f/0x50
 [<ffffffff811546d3>] vfs_readv+0x43/0x60
 [<ffffffff811547d1>] sys_readv+0x51/0x80
 [<ffffffff8100b132>] system_call_fastpath+0x16/0x1b
Code: c1 e5 03 4c 03 6b 20 4d 8b 65 00 49 c7 45 00 00 00 00 00 0f ae e8 48 8b 
53 28 31 c0 f6 c2 10 74 0a 41 f7 06 00 00 1e 00 0f 95 c0 <41> 8b 74 24 6c 49 8b 
8c 24 b0 01 00 00 85 f6 0f 18 09 0f 85 c0 
RIP  [<ffffffffa005afef>] ixgbe_poll+0x9df/0x1710 [ixgbe]
 RSP <ffff88080750b8b0>
CR2: 000000000000006c
---[ end trace 9db4623b9591cd54 ]---

addr2line says this is happening on line 2028 below - so a NULL skb 
pointer is being passed to skb_is_nonlinear():

 1990 static bool ixgbe_clean_rx_irq_ps(struct ixgbe_q_vector *q_vector,
 1991                                   struct ixgbe_ring *rx_ring,
 1992                                   int budget)
 1993 {
.....
 2021                 rmb();
 2022 
 2023                 pkt_is_rsc = ixgbe_get_rsc_state(rx_ring, rx_desc);
 2024 
 2025                 prefetch(skb->data);
 2026 
 2027                 /* pull the header of the skb in if no data is already 
present */
 2028                 if (!skb_is_nonlinear(skb)) {
 2029                         __skb_put(skb, ixgbe_get_hlen(rx_ring, rx_desc));

Anyone have a guess as to the cause? Or have you seen similar? 

One good clue that we've found is that the problem disappears if we 
turn off irq balancing. 

-- 
Arthur


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to