Hi,

This is to report a bug we found in the Intel 10 Gigabit PCI Express Linux 
driver (ixgbe).
Our driver has patched up to version 3.7.21.

Recently, we hit a crash with the following dump.

BUG: unable to handle kernel NULL pointer dereference at 000000000000006c
IP: [<ffffffffa005afef>] ixgbe_poll+0x9df/0x1720 [ixgbe]
PGD 3fc6cc067 PUD 3fc223067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/bypass/10-11/ping_watchdog
CPU 9
Pid: 3268, comm: sport Tainted: P           ----------------   2.6.32-perf #1
To Be Filled By O.E.M.
RIP: 0010:[<ffffffffa005afef>]  [<ffffffffa005afef>] ixgbe_poll+0x9df/0x1720
[ixgbe]
RSP: 0018:ffff8804075dd8b0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880410e90680 RCX: 0000000000000000
RDX: 0000000000000020 RSI: ffffc90004ca6000 RDI: ffff8804121ca580
RBP: ffff8804075dd970 R08: 0000000000000100 R09: 0000000000000000
R10: 0000000000000100 R11: 0000000000000000 R12: 0000000000000000
R13: ffffc9000479b190 R14: ffff8804059a40a0 R15: 000000000000000a
FS:  00007fe8a63a6700(0000) GS:ffff8804364c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000006c CR3: 0000000414564000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sport (pid: 3268, threadinfo ffff8804075dc000, task ffff880414bbe1c0)
Stack:
0000000000000000 ffff8804075c3d40 000000000000000e ffff880414d36780
<0> ffff8804075c3d98 ffff8804075dd918 ffffffff815671ac 000000010003503a
<0> ffff880410538700 0000000100000040 ffff8804121ca580 0000004012f9d89c

After comparing the disassembled code, we found that location ixgbe_poll+0x9df 
points to this line marked in red below as showed in the following subroutine.

static bool ixgbe_clean_rx_irq_ps(struct ixgbe_q_vector *q_vector,
                                  struct ixgbe_ring *rx_ring,
                                  int budget)
{
        union ixgbe_adv_rx_desc *rx_desc;
        unsigned int total_rx_bytes = 0, total_rx_packets = 0;
        const int current_node = numa_node_id();
#ifdef IXGBE_FCOE
        struct ixgbe_adapter *adapter = q_vector->adapter;
       int ddp_bytes = 0;
#endif /* IXGBE_FCOE */
        u16 i = rx_ring->next_to_clean;
        u16 cleaned_count = ixgbe_desc_unused(rx_ring);

        rx_desc = IXGBE_RX_DESC(rx_ring, i);

        while (ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_DD)) {
                union ixgbe_adv_rx_desc *next_rxd;
                struct ixgbe_rx_buffer *rx_buffer_info;
                struct sk_buff *skb;
                bool pkt_is_rsc, force_alloc = false;

                rx_buffer_info = &rx_ring->rx_buffer_info[i];
                skb = rx_buffer_info->skb;
                rx_buffer_info->skb = NULL;

                /*
                 * This memory barrier is needed to keep us from reading
                 * any other fields out of the rx_desc until we know the
                 * RXD_STAT_DD bit is set
                 */
                rmb();

                pkt_is_rsc = ixgbe_get_rsc_state(rx_ring, rx_desc);

                prefetch(skb->data);

                /* pull the header of the skb in if no data is already present 
*/
                if (!skb_is_nonlinear(skb)) {                                   
                    <-- This where the crash occurred.
                        __skb_put(skb, ixgbe_get_hlen(rx_ring, rx_desc));


The C source code shows that skb was de-referenced before the

skb_is_nonlinear() is call and question was raised to why it did not crash on 
the prefetch line below.



                prefetch(skb->data);         <--- line 2025



                /* pull the header of the skb in if no data is already present

\

*/

                if (!skb_is_nonlinear(skb)) {         <--- line 2028

                        __skb_put(skb, ixgbe_get_hlen(rx_ring, rx_desc));



Looking at the assembly code, the compiler seems to generate code that executes

line 2028 before line 2025.



ixgbe_clean_rx_irq_ps():

/work/sla/kernel_trunk/drivers/net/ixgbe/ixgbe_main.c:2028

    705f:       41 8b 74 24 6c          mov    0x6c(%r12),%esi

/work/sla/kernel_trunk/drivers/net/ixgbe/ixgbe_main.c:2025

    7064:       49 8b 8c 24 b8 01 00    mov    0x1b8(%r12),%rcx

    706b:       00

/work/sla/kernel_trunk/drivers/net/ixgbe/ixgbe_main.c:2028

    706c:       85 f6                   test   %esi,%esi

/work/sla/kernel_trunk/drivers/net/ixgbe/ixgbe_main.c:2025

As can be seen from the following in the same routine, rx_buffer_info->skb is 
assigned by the following lines.

#ifndef IXGBE_NO_LRO
                if (ixgbe_can_lro(rx_ring, rx_desc, skb)) {
                        ixgbe_lro_receive(q_vector, skb);
                        rx_buffer_info->skb = ixgbe_lro_recycled_skb(q_vector);

                } else {
#endif

Note that ixgbe_lro_recycled_skb() can return NULL if 
qvector->lrolist->recycled is empty.


The crash is not very reproducible we can only see it a few times right after 
the machines were rebooted. If we were to run the machine in a warm state, we 
don't likely see this problem. I wonder if you have heard the similar report 
like this?



There are two ways we are thinking of fixing this bug.



First approach

---------------

Please note that ixgbe_clean_rx_irq_bb() has a different way to deal with this 
problem. It checks if skb is NULL, if it is, it would try to the skb from the 
recycle q, if fails, it will call netdev_alloc_skb_ip_align() to get a skb. 
Would you suggest the following code (NULL_POINTER_FIX) is a proper way to fix 
this problem.



        while (ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_DD)) {
                union ixgbe_adv_rx_desc *next_rxd;
                struct ixgbe_rx_buffer *rx_buffer_info;
                struct sk_buff *skb;
                bool pkt_is_rsc;

                rx_buffer_info = &rx_ring->rx_buffer_info[i];
                skb = rx_buffer_info->skb;
                rx_buffer_info->skb = NULL;

                /*
                 * This memory barrier is needed to keep us from reading
                 * any other fields out of the rx_desc until we know the
                 * RXD_STAT_DD bit is set
                 */
                rmb();

                pkt_is_rsc = ixgbe_get_rsc_state(rx_ring, rx_desc);

#define NULL_POINTER_FIX
#ifdef NULL_POINTER_FIX

                if (!skb) {
                        printk("%s:%d no skb\n", __FUNCTION__, __LINE__);
#ifndef IXGBE_NO_LRO
                        /* retreive any recycled skbs first before allocating */
                        skb = ixgbe_lro_recycled_skb(q_vector);
                        if (!skb)
#endif
                                skb = 
netdev_alloc_skb_ip_align(netdev_ring(rx_ring),
                                                                
IXGBE_RX_HDR_SIZE);
                        if (!skb) {
                                printk("%s:%d cannot alloc skb\n",
                                       __FUNCTION__, __LINE__);
                                rx_ring->rx_stats.alloc_rx_buff_failed++;
                                break;
                        }

                        /*
                         * Delay unmapping of the first packet. It carries the
                         * header information, HW may still access the header
                         * after the writeback.  Only unmap it when EOP is
                         * reached
                         */
                        IXGBE_CB(skb)->dma = rx_buffer_info->page_dma;

                        /* initialize skb for ring */
                        skb_record_rx_queue(skb, ring_queue_index(rx_ring));

                }
#endif // NULL_POINTER_FIX

Second Approach
---------------------
Another approach we have in mind is to force ixgbe_alloc_rx_buffer_ps() to be 
called whenever the recycle queue is empty.

       while (ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_DD)) {
...
                bool pkt_is_rsc, force_alloc = false;         // added 
force_alloc
...

#ifndef IXGBE_NO_LRO
                if (ixgbe_can_lro(rx_ring, rx_desc, skb)) {    // force 
allocation whenever the recycle q is empty
                        ixgbe_lro_receive(q_vector, skb);
                        rx_buffer_info->skb = ixgbe_lro_recycled_skb(q_vector);
                        if (rx_buffer_info->skb == NULL) {
                                rx_buffer_info->dma = 0; /* XXX */
                                printk(KERN_CRIT "%s: forcing allocation\n",
                                       __func__);
                                force_alloc = true;
                        }
...
                /* return some buffers to hardware, one at a time is too slow 
*/   // when force_alloc is true, we would get more skb
                if (force_alloc || cleaned_count >= IXGBE_RX_BUFFER_WRITE) {
                        ixgbe_alloc_rx_buffers_ps(rx_ring, cleaned_count);
                        cleaned_count = 0;
                }

...
      }


This is crash is tough to reproduce so it is very hard to test again either one 
of the proposed fix above.
Would you please kindly make any suggestion to the fix of this problem?

Thank you very much.

--Steven
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to