On Tue, Mar 10, 2015 at 8:51 PM, Nick Krause <xerofo...@gmail.com> wrote:
> On Tue, Mar 10, 2015 at 4:43 AM, Jeff Kirsher
> <jeffrey.t.kirs...@intel.com> wrote:
>> On Tue, 2015-03-10 at 15:09 +0800, Bob Liu wrote:
>>> Hi Jeff,
>>>
>>> Recently I met an issue that is likely related with ixgbe driver which
>>> I'm not familiar.
>>> I didn't subscribe linux-net, so I just send email to you.
>>
>> Add e1000-devel mailing list (no need to subscribe) because it will get
>> to all of the Intel Wired LAN developers.
>>
>>>
>>> It happened when running block benchmark and the backend was an iSCSI disk.
>>> I got below panic at put_page_testzero(), at that time the ixgbe was
>>> freeing skb pages in __skb_frag_unref() but the page->_count was
>>> already 0.
>>>
>>> I'd like to know is it possible that the ixgbe driver notifies the
>>> upper layer "transmit package complete" before ixgbe_clean_tx_irq()?
>>> Because in this case, the upper layer may free the page before ixgbe driver.
>>> Or do you have any other suggestions on this bug?
>>> Thanks a lot!
>>> --Bob
>>>
>>> ------------[ cut here ]------------
>>> kernel BUG at include/linux/mm.h:288!
>>> invalid opcode: 0000 [#1] SMP
>>> Modules linked in: dm_queue_length ib_iser rdma_cm ib_cm iw_cm ib_sa
>>> ib_mad ib_core ib_addr iscsi_tcp xt_mac xt_nat nf_conntrack_netlink
>>> xt_conntrack ipt_REJECT xt_TCPMSS xt_comment xt_connmark iptable_raw
>>> xt_REDIRECT ext4 jbd2 xt_state xt_NFQUEUE iptable_nat nf_conntrack_ipv4
>>> nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_gre gre
>>> nfnetlink_queue nfnetlink ip6table_filter ip6_tables ebtable_nat
>>> ebtables softdog iptable_filter ip_tables xen_pciback xen_netback
>>> xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd 8021q
>>> garp bridge stp llc sunrpc bonding mlx4_en mlx4_core ipv6 ipmi_devintf
>>> ipmi_si ipmi_msghandler vhost_net macvtap macvlan tun iTCO_wdt
>>> iTCO_vendor_support coretemp freq_table mperf intel_powerclamp
>>> ghash_clmulni_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core
>>> shpchp ioatdma sg ext3 jbd mbcache dm_round_robin sd_mod crc_t10dif
>>> aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci
>>> megaraid_sas ixgbe hwmon dca dm_multipath dm_mirror dm_region_hash
>>> dm_log dm_mod crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i
>>> libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi
>>> scsi_transport_iscsi [last unloaded: bonding]
>>> CPU 0
>>> Pid: 31309, comm: kworker/0:0 Tainted: G        W
>>> RIP: e030:[<ffffffff8113d481>]  [<ffffffff8113d481>] put_page+0x31/0x50
>>> RSP: e02b:ffff880278e03d10  EFLAGS: 00010246
>>> RAX: 0000000000000000 RBX: ffff8802692257b8 RCX: 00000000ffffffff
>>> RDX: ffff88026ea4b2c0 RSI: 0000000000000200 RDI: ffffea00088670c0
>>> RBP: ffff880278e03d10 R08: ffff88026a6e4500 R09: ffff880270a25098
>>> R10: 0000000000000001 R11: ffff880278e03cf0 R12: 0000000000000006
>>> R13: 00000000ffffff8e R14: ffff880270a25098 R15: ffff88026c95d9f0
>>> FS:  00007fbb497cf700(0000) GS:ffff880278e00000(0000) knlGS:0000000000000000
>>> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 00000000007aaed0 CR3: 00000002703c2000 CR4: 0000000000042660
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> Process kworker/0:0 (pid: 31309, threadinfo ffff88020c608000, task
>>> ffff880200e24140)
>>> Stack:
>>>  ffff880278e03d30 ffffffff814c3855 ffff8802692257b8 ffff8802692257b8
>>>  ffff880278e03d50 ffffffff814c38ee 0000000000000000 ffffc9001188faa0
>>>  ffff880278e03d70 ffffffff814c3ff1 ffffc9001188faa0 ffff88026c95d8e0
>>> Call Trace:
>>>  <IRQ>
>>>  [<ffffffff814c3855>] skb_release_data+0x75/0xf0
>>>  [<ffffffff814c38ee>] __kfree_skb+0x1e/0xa0
>>>  [<ffffffff814c3ff1>] consume_skb+0x31/0x70
>>>  [<ffffffff814ce6ed>] dev_kfree_skb_any+0x3d/0x50
>>>  [<ffffffffa01d0bdc>] ixgbe_clean_tx_irq+0xac/0x530 [ixgbe]
>>>  [<ffffffffa01d10b3>] ixgbe_poll+0x53/0x1a0 [ixgbe]
>>>  [<ffffffff814d3d05>] net_rx_action+0x105/0x2b0
>>>  [<ffffffff81066587>] __do_softirq+0xd7/0x240
>>>  [<ffffffff815a7c5c>] call_softirq+0x1c/0x30
>>>  [<ffffffff810174b5>] do_softirq+0x65/0xa0
>>>  [<ffffffff8106636d>] irq_exit+0xbd/0xe0
>>>  [<ffffffff8133d3e5>] xen_evtchn_do_upcall+0x35/0x50
>>>  [<ffffffff815a7cbe>] xen_do_hypervisor_callback+0x1e/0x30
>>>  <EOI>
>>>  [<ffffffff8100128a>] ? xen_hypercall_grant_table_op+0xa/0x20
>>>  [<ffffffff8100128a>] ? xen_hypercall_grant_table_op+0xa/0x20
>>>  [<ffffffff8133a4e6>] ? gnttab_unmap_refs+0x26/0x70
>>>  [<ffffffff8133a5ba>] ? __gnttab_unmap_refs_async+0x8a/0xb0
>>>  [<ffffffff8133a672>] ? gnttab_unmap_work+0x22/0x30
>>>  [<ffffffff8107bf10>] ? process_one_work+0x180/0x420
>>>  [<ffffffff8107df4e>] ? worker_thread+0x12e/0x390
>>>  [<ffffffff8107de20>] ? manage_workers+0x180/0x180
>>>  [<ffffffff8108329e>] ? kthread+0xce/0xe0
>>>  [<ffffffff810039ee>] ? xen_end_context_switch+0x1e/0x30
>>>  [<ffffffff810831d0>] ? kthread_freezable_should_stop+0x70/0x70
>>>  [<ffffffff815a682c>] ? ret_from_fork+0x7c/0xb0
>>>  [<ffffffff810831d0>] ? kthread_freezable_should_stop+0x70/0x70
>>> Code: 66 66 90 66 f7 07 00 c0 75 25 8b 47 1c 85 c0 74 1a f0 ff 4f 1c 0f
>>> 94 c0 84 c0 75 06 c9 c3 0f 1f 40 00 e8 43 fd ff ff c9 66 90 c3 <0f> 0b
>>> eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 e8 5b fd ff ff c9
>>> RIP  [<ffffffff8113d481>] put_page+0x31/0x50
>>>  RSP <ffff880278e03d10>
>>> ---[ end trace 9f93fe018444fc09 ]---
>>> Kernel panic - not syncing: Fatal exception in interrupt

> Would you mind sending your actual test cases to prove that the page
> is free when calling _skb_frag_unref

I'm afraid not, it's a complex testing program and difficult to run on
external environment because of many dependencies.

> from the ixgbe code. This seems like a viable area this would be going
> wrong and we need synchronization
> for packets calling dev_kfree_skb_any in ixgbe_clean_tx_irq if this is
> happening.

Since it's a iSCSI device, I'd like to know whether ixgbe driver will
notifiy the upper layer "transmit package complete" before
dev_kfree_skb_any()?

The situation I'm worry about is like this:

block driver:                               net driver:
submit_bio()
                                                ixgbe_xmit_frame_ring()
                                                notify block driver tx done

bio_done()
{
    free_page()
}

                                                 dev_kfree_skb_any()
                                                 ^^^ Here the page
already freed by block driver.

Thanks!
-Bob


-- 
Regards,
--Bob

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to