Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic

Patil, Kiran Thu, 11 Nov 2010 14:57:54 -0800

Yes, transport_generic_handle_data which is called from ft_recv_write_data can 
do msleep_interruptible only if transport is active.


FYI, this msleep was not introduced by my patch, it has been there.

Agree with Joe's both suggestion (fcoe_rcv - always let it go to processing 
thread and TCM should not block per CPU receive thread). Will let Nick comment 
on that.

Thanks,
-- Kiran P.

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of Joe Eykholt
Sent: Thursday, November 11, 2010 11:52 AM
To: Jansen, Frank
Cc: [email protected]
Subject: Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while 
atomic



On 11/11/10 11:41 AM, Jansen, Frank wrote:
> Greetings!
> 
> I'm running 2.6.36 with Kiran Patil's patches from 10/28/10.
> 
> I have 4 logical volumes configured over fcoe:
> 
> [r...@dut ~]# tcm_node --listhbas
> \------> iblock_0
>        HBA Index: 1 plugin: iblock version: v4.0.0-rc5
>        \-------> r0_lun3
>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> SectorSize: 512  MaxSectors: 1024
>        iBlock device: dm-4  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l3
>        Major: 253 Minor: 4  CLAIMED: IBLOCK
>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l3
>        \-------> r0_lun2
>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> SectorSize: 512  MaxSectors: 1024
>        iBlock device: dm-3  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l2
>        Major: 253 Minor: 3  CLAIMED: IBLOCK
>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l2
>        \-------> r0_lun1
>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> SectorSize: 512  MaxSectors: 1024
>        iBlock device: dm-2  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l1
>        Major: 253 Minor: 2  CLAIMED: IBLOCK
>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l1
>        \-------> r0_lun0
>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> SectorSize: 512  MaxSectors: 1024
>        iBlock device: dm-1  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l0
>        Major: 253 Minor: 1  CLAIMED: IBLOCK
>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l0
> 
> When any significant I/O load is put on any of the devices, I receive
> a flood of the following messages:
> 
>> Nov 11 13:46:09 dut kernel: BUG: scheduling while atomic:
>> LIO_iblock/4439/0x00000101
>> Nov 11 13:46:09 dut kernel: Modules linked in: fcoe libfcoe
>> target_core_stgt target_core_pscsi target_core_file target_core_iblock
>> ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 tcm_fc libfc
>> scsi_transport_fc scsi_tgt target_core_mod configfs sunrpc ipv6
>> dm_mirror dm_region_hash dm_log kvm_intel kvm uinput ixgbe ioatdma
>> iTCO_wdt ses enclosure i2c_i801 i2c_core iTCO_vendor_support mdio sg
>> igb dca pcspkr evbug evdev ext4 mbcache jbd2 sd_mod crc_t10dif
>> pata_acpi ata_generic mpt2sas scsi_transport_sas ata_piix raid_class
>> dm_mod [last unloaded: speedstep_lib]
>> Nov 11 13:46:09 dut kernel: Pid: 4439, comm: LIO_iblock Not tainted
>> 2.6.36+ #1
>> Nov 11 13:46:09 dut kernel: Call Trace:
>> Nov 11 13:46:09 dut kernel: <IRQ>  [<ffffffff8104fb96>]
>> __schedule_bug+0x66/0x70
>> Nov 11 13:46:09 dut kernel: [<ffffffff8149779c>] schedule+0xa2c/0xa60
>> Nov 11 13:46:09 dut kernel: [<ffffffff81497d73>]
>> schedule_timeout+0x173/0x2e0
>> Nov 11 13:46:09 dut kernel: [<ffffffff81071200>] ?
>> process_timeout+0x0/0x10
>> Nov 11 13:46:09 dut kernel: [<ffffffff81497f3e>]
>> schedule_timeout_interruptible+0x1e/0x20
>> Nov 11 13:46:09 dut kernel: [<ffffffff81072b39>]
>> msleep_interruptible+0x39/0x50
>> Nov 11 13:46:09 dut kernel: [<ffffffffa033ebfa>]
>> transport_generic_handle_data+0x2a/0x80 [target_core_mod]
>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c33ee>]
>> ft_recv_write_data+0x1fe/0x2b0 [tcm_fc]
>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c13cb>] ft_recv_seq+0x8b/0xc0
>> [tcm_fc]
>> Nov 11 13:46:09 dut kernel: [<ffffffffa03a0e1f>]
>> fc_exch_recv+0x61f/0xe20 [libfc]
>> Nov 11 13:46:09 dut kernel: [<ffffffff813c1123>] ?
>> skb_copy_bits+0x63/0x2c0
>> Nov 11 13:46:09 dut kernel: [<ffffffff813c15ea>] ?
>> __pskb_pull_tail+0x26a/0x360
>> Nov 11 13:46:09 dut kernel: [<ffffffffa015b86d>]
>> fcoe_recv_frame+0x18d/0x340 [fcoe]
>> Nov 11 13:46:09 dut kernel: [<ffffffff813c13df>] ?
>> __pskb_pull_tail+0x5f/0x360
>> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
>> __netdev_alloc_skb+0x24/0x50
>> Nov 11 13:46:09 dut kernel: [<ffffffffa015e52a>] fcoe_rcv+0x2aa/0x44c
>> [fcoe]
>> Nov 11 13:46:09 dut kernel: [<ffffffff8113c897>] ?
>> __kmalloc_node_track_caller+0x67/0xe0
>> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
>> __netdev_alloc_skb+0x24/0x50
>> Nov 11 13:46:09 dut kernel: [<ffffffff813cd39a>]
>> __netif_receive_skb+0x41a/0x5d0
>> Nov 11 13:46:09 dut kernel: [<ffffffff81012699>] ? read_tsc+0x9/0x20
>> Nov 11 13:46:09 dut kernel: [<ffffffff813ceab8>]
>> netif_receive_skb+0x58/0x80
>> Nov 11 13:46:09 dut kernel: [<ffffffff813cec20>]
>> napi_skb_finish+0x50/0x70
>> Nov 11 13:46:09 dut kernel: [<ffffffff813cf1a5>]
>> napi_gro_receive+0xc5/0xd0
>> Nov 11 13:46:09 dut kernel: [<ffffffffa0207a1f>]
>> ixgbe_clean_rx_irq+0x31f/0x840 [ixgbe]
>> Nov 11 13:46:09 dut kernel: [<ffffffffa02083a6>]
>> ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe]
>> Nov 11 13:46:09 dut kernel: [<ffffffff813cf382>]
>> net_rx_action+0x102/0x250
>> Nov 11 13:46:09 dut kernel: [<ffffffff81068af2>]
>> __do_softirq+0xb2/0x240
>> Nov 11 13:46:09 dut kernel: [<ffffffff8100c07c>] call_softirq+0x1c/0x30
>> Nov 11 13:46:09 dut kernel: <EOI>  [<ffffffff8100db25>] ?
>> do_softirq+0x65/0xa0
>> Nov 11 13:46:09 dut kernel: [<ffffffff81068664>]
>> local_bh_enable+0x94/0xa0
>> Nov 11 13:46:09 dut kernel: [<ffffffff813cdfd3>]
>> dev_queue_xmit+0x143/0x3b0
>> Nov 11 13:46:09 dut kernel: [<ffffffffa015d96e>] fcoe_xmit+0x30e/0x520
>> [fcoe]
>> Nov 11 13:46:09 dut kernel: [<ffffffffa03a2a13>] ?
>> _fc_frame_alloc+0x33/0x90 [libfc]
>> Nov 11 13:46:09 dut kernel: [<ffffffffa039f904>] fc_seq_send+0xb4/0x140
>> [libfc]
>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c1722>]
>> ft_write_pending+0x112/0x160 [tcm_fc]
>> Nov 11 13:46:09 dut kernel: [<ffffffffa0347800>]
>> transport_generic_new_cmd+0x280/0x2b0 [target_core_mod]
>> Nov 11 13:46:09 dut kernel: [<ffffffffa03479d4>]
>> transport_processing_thread+0x1a4/0x7c0 [target_core_mod]
>> Nov 11 13:46:09 dut kernel: [<ffffffff810835d0>] ?
>> autoremove_wake_function+0x0/0x40
>> Nov 11 13:46:09 dut kernel: [<ffffffffa0347830>] ?
>> transport_processing_thread+0x0/0x7c0 [target_core_mod]
>> Nov 11 13:46:09 dut kernel: [<ffffffff81082f36>] kthread+0x96/0xa0
>> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf84>]
>> kernel_thread_helper+0x4/0x10
>> Nov 11 13:46:09 dut kernel: [<ffffffff81082ea0>] ? kthread+0x0/0xa0
>> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf80>] ?
>> kernel_thread_helper+0x0/0x10
> 
> I started noticing these issues first when I ran I/O with larger
> filesizes (appr. 25GB), but I'm thinking that might be a red herring.
> I'll rebuild the kernel and tools to make sure nothing is out of sorts
> and will report on any additional findings.
> 
> Thanks,
> 
> Frank

FCP data frames are coming in at the interrupt level, and TCM expects
to be called in a thread or non-interrupt context, since
transport_generic_handle_data() may sleep.

A quick workaround would be to change the fast path in fcoe_rcv() so that
data always goes through the per-cpu receive threads.   That avoids part of the
problem, but isn't anything like the right fix.  It doesn't seem good to
let TCM block FCoE's per-cpu receive thread either.

Here's a quick change if you want to just work around the problem.
I haven't tested it:

diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index feddb53..8f854cd 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -1285,6 +1285,7 @@ int fcoe_rcv(struct sk_buff *skb, struct net_device 
*netdev,
         * BLOCK softirq context.
         */
        if (fh->fh_type == FC_TYPE_FCP &&
+           0 &&
            cpu == smp_processor_id() &&
            skb_queue_empty(&fps->fcoe_rx_list)) {
                spin_unlock_bh(&fps->fcoe_rx_list.lock);

---

        Cheers,
        Joe




_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic

Reply via email to