Hi Alex,

Based on the trace below it looks like you might have somehow been able
to transmit to a ring when it was disabled.

Would it be possible for you to try testing this with the latest ixgbevf
driver from e1000.sf.net to see if it has the same issue?  My concern is
that there have been a number of changes made to this driver over the
past several months and it is possible that this issue may already be
resolved in either a newer kernel or a newer driver.

Thanks,

Alex

On 04/27/2014 06:06 AM, Alex Lyakas wrote:
> Hi Alexander, Greg,
> 
> We had a crash in ixgbevf_xmit_frame dereferencing a NULL pointer.
> Setup is like this: 82599EB Intel NIC, spawning 32 VFs on each port. Two
> VFs from different ports are assigned to the physical machine (not to a
> VM) and a 8021q interfaces are created on top of them (using vconfig).
> Finally, a bond is created on top of the 8021q interfaces. Bond is in
> active-backup mode with failover-mac setting set to 1. Kernel is stock
> ubuntu-precise, 3.2.0-29-generic #46.
> 
> The crash happened when the bond lost its last active interface:
> 
> <6>[3286007.705734] bonding: bebond: link status definitely down for
> interface be10G1.1801, disabling it
> <6>[3286007.705749] bonding: bebond: now running without any active
> interface !
> <1>[3286007.805601] BUG: unable to handle kernel NULL pointer
> dereference at 0000000000000018
> <1>[3286007.845173] IP: [<ffffffffa01003c8>]
> ixgbevf_xmit_frame+0x3e8/0xbd0 [ixgbevf]
> <4>[3286007.845190] PGD 0
> <0>[3286007.845193] Oops: 0002 [#1] SMP
> <4>[3286007.845198] CPU 11
> <4>[3286007.845200] Modules linked in: softdog nbd btrfs zlib_deflate
> ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 pci_stub
> nls_iso8859_1 nls_cp437 vfat fat drbd lru_cache ip6table_filter
> ip6_tables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4
> nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM
> iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables kvm_intel
> kvm(O) vesafb scst_vdisk(O) iscsi_scst(O) scst(O) libcrc32c bridge
> sb_edac dm_iostat(O) edac_core joydev ioatdma dm_multipath wmi acpi_pad
> mac_hid bonding 8021q garp stp lp parport ses enclosure usbhid hid
> ixgbevf(O) isci libsas igb ixgbe(O) scsi_transport_sas megaraid_sas
> dm_raid45 dca xor dm_mirror dm_region_hash dm_log [last unloaded:
> scsi_transport_iscsi]
> <4>[3286007.845257]
> <4>[3286007.845262] Pid: 31598, comm: kworker/u:2 Tainted: G           O
> 3.2.0-29-generic #46-Ubuntu Supermicro
> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+
> <4>[3286007.845269] RIP: 0010:[<ffffffffa01003c8>]  [<ffffffffa01003c8>]
> ixgbevf_xmit_frame+0x3e8/0xbd0 [ixgbevf]
> <4>[3286007.845278] RSP: 0018:ffff8803a17379e0  EFLAGS: 00010283
> <4>[3286007.845281] RAX: 000000000000002a RBX: 0000000000000000 RCX:
> 000000000000002a
> <4>[3286007.845284] RDX: ffff880c26d86780 RSI: ffff880c26c1ec80 RDI:
> ffff880b901ec300
> <4>[3286007.845287] RBP: ffff8803a1737a80 R08: ffff880c2954e000 R09:
> 0000000000000000
> <4>[3286007.845290] R10: ffff880c26d86780 R11: 0000000000000000 R12:
> 000000000000002a
> <4>[3286007.845293] R13: 0000000000000000 R14: 0000000000000000 R15:
> 000000000000002a
> <4>[3286007.845297] FS:  0000000000000000(0000)
> GS:ffff880c4f360000(0000) knlGS:0000000000000000
> <4>[3286007.845301] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> <4>[3286007.845304] CR2: 0000000000000018 CR3: 0000000001c05000 CR4:
> 00000000000406e0
> <4>[3286007.845307] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> <4>[3286007.845310] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> <4>[3286007.845314] Process kworker/u:2 (pid: 31598, threadinfo
> ffff8803a1736000, task ffff8803d3a18000)
> <0>[3286007.845317] Stack:
> <4>[3286007.845319]  ffff8803a1737a00 0000000000000000 0000002aa1730000
> ffff880c26d86000
> <4>[3286007.845327]  ffff880c2954e090 0000000000000000 ffff880c26d86780
> ffff880c2954e000
> <4>[3286007.845334]  0000000000000000 0000000000000000 ffff880c26c1ec80
> 000000000000002a
> <0>[3286007.845340] Call Trace:
> <4>[3286007.845355]  [<ffffffff815426d6>] dev_hard_start_xmit+0x256/0x540
> <4>[3286007.845364]  [<ffffffff8155f33e>] sch_direct_xmit+0xfe/0x1d0
> <4>[3286007.845370]  [<ffffffff81542af7>] dev_queue_xmit+0x137/0x420
> <4>[3286007.845381]  [<ffffffffa013207c>] bond_dev_queue_xmit+0x2c/0x70
> [bonding]
> <4>[3286007.845390]  [<ffffffffa01325ce>] __bond_start_xmit+0x1ce/0x250
> [bonding]
> <4>[3286007.845399]  [<ffffffffa01326bb>] bond_start_xmit+0x6b/0x80
> [bonding]
> <4>[3286007.845405]  [<ffffffff815426d6>] dev_hard_start_xmit+0x256/0x540
> <4>[3286007.845413]  [<ffffffff815a16d5>] ? arp_create+0x65/0x280
> <4>[3286007.845420]  [<ffffffff81542c6a>] dev_queue_xmit+0x2aa/0x420
> <4>[3286007.845426]  [<ffffffff815a1aa8>] arp_xmit+0x58/0x60
> <4>[3286007.845431]  [<ffffffff815a1af3>] arp_send+0x43/0x50
> <4>[3286007.845436]  [<ffffffff815a6b81>] inetdev_event+0x101/0x2f0
> <4>[3286007.845445]  [<ffffffff8165d7fd>] notifier_call_chain+0x4d/0x70
> <4>[3286007.845456]  [<ffffffff81090116>] raw_notifier_call_chain+0x16/0x20
> <4>[3286007.845463]  [<ffffffff8153cab6>]
> call_netdevice_notifiers+0x36/0x60
> <4>[3286007.845468]  [<ffffffff8153d247>] netdev_bonding_change+0x17/0x20
> <4>[3286007.845477]  [<ffffffffa0133a5e>] bond_mii_monitor+0xce/0x190
> [bonding]
> <4>[3286007.845486]  [<ffffffffa0133990>] ?
> bond_miimon_commit+0x2d0/0x2d0 [bonding]
> <4>[3286007.845496]  [<ffffffff810849ea>] process_one_work+0x11a/0x480
> <4>[3286007.845503]  [<ffffffff81085794>] worker_thread+0x164/0x370
> <4>[3286007.845509]  [<ffffffff81085630>] ?
> manage_workers.isra.29+0x130/0x130
> <4>[3286007.845515]  [<ffffffff81089fbc>] kthread+0x8c/0xa0
> <4>[3286007.845523]  [<ffffffff81664034>] kernel_thread_helper+0x4/0x10
> <4>[3286007.845528]  [<ffffffff81089f30>] ? flush_kthread_worker+0xa0/0xa0
> <4>[3286007.845533]  [<ffffffff81664030>] ? gs_change+0x13/0x13
> 
> Looking at the code:
> (gdb) l *ixgbevf_xmit_frame + 0x3e8
> 0x33f8 is in ixgbevf_xmit_frame
> (/mnt/work/alex/Ubuntu-3.2.0-29.46/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:2895).
> 
> 2890            len = min(skb_headlen(skb), total);
> 2891            while (len) {
> 2892                    tx_buffer_info = &tx_ring->tx_buffer_info[i];
> 2893                    size = min(len, (unsigned
> int)IXGBE_MAX_DATA_PER_TXD);
> 2894
> 2895                    tx_buffer_info->length = size;
> 2896                    tx_buffer_info->mapped_as_page = false;
> 2897                    tx_buffer_info->dma =
> dma_map_single(&adapter->pdev->dev,
> 2898                                                         skb->data +
> offset,
> 2899                                                         size,
> DMA_TO_DEVICE);
> 
> It looks like tx_buffer_info is NULL, thus referencing "length" yields
> 0000000000000018.
> 
> Some googling suggested a possibly related problem in
> https://bugzilla.redhat.com/show_bug.cgi?id=862862 when interfaces had
> long names, thus initializing the msix interrupt names would trap over
> "tx_ring" field (that's how I got your emails). Could this be related?
> 
> Thanks,
> Alex.
> 
> 


------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to