On 07/11/2012 06:18 PM, akepner wrote:
> Using the 3.7.21 version of the ixgbe driver we can reliably 
> produce a crash with this signature: 
>
> BUG: unable to handle kernel NULL pointer dereference at 000000000000006c
> IP: [<ffffffffa005afef>] ixgbe_poll+0x9df/0x1710 [ixgbe]
> PGD 814c7b067 PUD 8074dd067 PMD 0 
> Oops: 0000 [#1] SMP 
> last sysfs file: /sys/devices/virtual/bypass/8-9/ping_watchdog
> CPU 2 
> Pid: 18925, comm: sport Tainted: P           ----------------   2.6.32-perf 
> #1 To Be Filled By O.E.M.
> RIP: 0010:[<ffffffffa005afef>]  [<ffffffffa005afef>] ixgbe_poll+0x9df/0x1710 
> [ixgbe]
> RSP: 0018:ffff88080750b8b0  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff88040f816f00 RCX: 0000000000000000
> RDX: 0000000000000020 RSI: ffffc9000429c000 RDI: ffff88040f891d80
> RBP: ffff88080750b970 R08: 0000000000000100 R09: 0000000000000000
> R10: 0000000000000100 R11: ffff88080750bfd8 R12: 0000000000000000
> R13: ffffc900041221b8 R14: ffff8804077580b0 R15: 000000000000000b
> FS:  00007f61ccda9700(0000) GS:ffff880028280000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000000006c CR3: 0000000814436000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process sport (pid: 18925, threadinfo ffff88080750a000, task ffff880814beeb60)
> Stack:
>  0000000000000000  ffff8804148b0540 000000000000000e ffff88040703d1c0
> <0> ffff8804148b0598 ffff88080750b918 ffffffff815671ac 00000001000359c2
> <0> ffff880410744700 0000000000000040 ffff88040f891d80 0000004011087c9c
> Call Trace:
>  [<ffffffff815671ac>] ? ip_finish_output+0x13c/0x310
>  [<ffffffff8152b468>] net_rx_action+0xb8/0x400
>  [<ffffffff81517a84>] ? sock_def_readable+0x44/0x80
>  [<ffffffff81066a91>] __do_soft/irq+0xc1/0x1d0
>  [<ffffffff8100c1ec>] call_softirq+0x1c/0x30
>  [<ffffffff8100de25>] do_softirq+0x65/0xa0
>  [<ffffffff8106699a>] local_bh_enable+0x9a/0xb0
>  [<ffffffff815176fc>] lock_sock_nested+0xac/0xc0
>  [<ffffffff81641f0b>] ? _spin_unlock_bh+0x1b/0x20
>  [<ffffffff81517627>] ? release_sock+0xd7/0x100
>  [<ffffffff81571838>] tcp_recvmsg+0x38/0xe80
>  [<ffffffff812d4c19>] ? cpumask_next_and+0x29/0x50
>  [<ffffffff8104b6f4>] ? find_busiest_group+0x244/0xb10
>  [<ffffffff810544d2>] ? default_wake_function+0x12/0x20
>  [<ffffffff81516cf9>] sock_common_recvmsg+0x39/0x50
>  [<ffffffff81516829>] sock_aio_read+0x159/0x160
>  [<ffffffff8104dbd3>] ? perf_event_task_sched_out+0x33/0x80
>  [<ffffffff810097ac>] ? __switch_to+0x1ac/0x320
>  [<ffffffff815166d0>] ? sock_aio_read+0x0/0x160
>  [<ffffffff811533bb>] do_sync_readv_writev+0xfb/0x140
>  [<ffffffff810853b0>] ? autoremove_wake_function+0x0/0x40
>  [<ffffffff811543df>] do_readv_writev+0xcf/0x1f0
>  [<ffffffff8156dc0d>] ? do_tcp_getsockopt+0x3d/0x5f0
>  [<ffffffff81012879>] ? read_tsc+0x9/0x20
>  [<ffffffff8108fc13>] ? ktime_get+0x63/0xe0
>  [<ffffffff810650c2>] ? ns_to_timeval+0x12/0x40
>  [<ffffffff810896af>] ? hrtimer_get_remaining+0x3f/0x50
>  [<ffffffff811546d3>] vfs_readv+0x43/0x60
>  [<ffffffff811547d1>] sys_readv+0x51/0x80
>  [<ffffffff8100b132>] system_call_fastpath+0x16/0x1b
> Code: c1 e5 03 4c 03 6b 20 4d 8b 65 00 49 c7 45 00 00 00 00 00 0f ae e8 48 8b 
> 53 28 31 c0 f6 c2 10 74 0a 41 f7 06 00 00 1e 00 0f 95 c0 <41> 8b 74 24 6c 49 
> 8b 8c 24 b0 01 00 00 85 f6 0f 18 09 0f 85 c0 
> RIP  [<ffffffffa005afef>] ixgbe_poll+0x9df/0x1710 [ixgbe]
>  RSP <ffff88080750b8b0>
> CR2: 000000000000006c
> ---[ end trace 9db4623b9591cd54 ]---
>
> addr2line says this is happening on line 2028 below - so a NULL skb 
> pointer is being passed to skb_is_nonlinear():
>
>  1990 static bool ixgbe_clean_rx_irq_ps(struct ixgbe_q_vector *q_vector,
>  1991                                   struct ixgbe_ring *rx_ring,
>  1992                                   int budget)
>  1993 {
> .....
>  2021                 rmb();
>  2022 
>  2023                 pkt_is_rsc = ixgbe_get_rsc_state(rx_ring, rx_desc);
>  2024 
>  2025                 prefetch(skb->data);
>  2026 
>  2027                 /* pull the header of the skb in if no data is already 
> present */
>  2028                 if (!skb_is_nonlinear(skb)) {
>  2029                         __skb_put(skb, ixgbe_get_hlen(rx_ring, 
> rx_desc));
>
> Anyone have a guess as to the cause? Or have you seen similar? 
>
> One good clue that we've found is that the problem disappears if we 
> turn off irq balancing. 
>
It seems like it might be some sort of memory corruption.  In order to
get into that state the DD bit has to be set for the descriptor and the
skb would have to be NULL.

What part is it you are running this on?  Is it an 82599?  If so you
might be running into a FIFO corruption issue, however it was my
understanding that the driver wouldn't allow you to enable the packet
split mode on that part.  If you are using an 82599 you may want to use
a newer driver since we rewrote the receive path to allow us to use
paged frames without using the hardware packet split mode.

Thanks,

Alex



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to