Yes, transport_generic_handle_data which is called from ft_recv_write_data can do msleep_interruptible only if transport is active.
FYI, this msleep was not introduced by my patch, it has been there. Agree with Joe's both suggestion (fcoe_rcv - always let it go to processing thread and TCM should not block per CPU receive thread). Will let Nick comment on that. Thanks, -- Kiran P. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Joe Eykholt Sent: Thursday, November 11, 2010 11:52 AM To: Jansen, Frank Cc: [email protected] Subject: Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic On 11/11/10 11:41 AM, Jansen, Frank wrote: > Greetings! > > I'm running 2.6.36 with Kiran Patil's patches from 10/28/10. > > I have 4 logical volumes configured over fcoe: > > [r...@dut ~]# tcm_node --listhbas > \------> iblock_0 > HBA Index: 1 plugin: iblock version: v4.0.0-rc5 > \-------> r0_lun3 > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > SectorSize: 512 MaxSectors: 1024 > iBlock device: dm-4 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l3 > Major: 253 Minor: 4 CLAIMED: IBLOCK > udev_path: /dev/vg_R0_p1/lv_R0_p1_l3 > \-------> r0_lun2 > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > SectorSize: 512 MaxSectors: 1024 > iBlock device: dm-3 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l2 > Major: 253 Minor: 3 CLAIMED: IBLOCK > udev_path: /dev/vg_R0_p1/lv_R0_p1_l2 > \-------> r0_lun1 > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > SectorSize: 512 MaxSectors: 1024 > iBlock device: dm-2 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l1 > Major: 253 Minor: 2 CLAIMED: IBLOCK > udev_path: /dev/vg_R0_p1/lv_R0_p1_l1 > \-------> r0_lun0 > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > SectorSize: 512 MaxSectors: 1024 > iBlock device: dm-1 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l0 > Major: 253 Minor: 1 CLAIMED: IBLOCK > udev_path: /dev/vg_R0_p1/lv_R0_p1_l0 > > When any significant I/O load is put on any of the devices, I receive > a flood of the following messages: > >> Nov 11 13:46:09 dut kernel: BUG: scheduling while atomic: >> LIO_iblock/4439/0x00000101 >> Nov 11 13:46:09 dut kernel: Modules linked in: fcoe libfcoe >> target_core_stgt target_core_pscsi target_core_file target_core_iblock >> ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 tcm_fc libfc >> scsi_transport_fc scsi_tgt target_core_mod configfs sunrpc ipv6 >> dm_mirror dm_region_hash dm_log kvm_intel kvm uinput ixgbe ioatdma >> iTCO_wdt ses enclosure i2c_i801 i2c_core iTCO_vendor_support mdio sg >> igb dca pcspkr evbug evdev ext4 mbcache jbd2 sd_mod crc_t10dif >> pata_acpi ata_generic mpt2sas scsi_transport_sas ata_piix raid_class >> dm_mod [last unloaded: speedstep_lib] >> Nov 11 13:46:09 dut kernel: Pid: 4439, comm: LIO_iblock Not tainted >> 2.6.36+ #1 >> Nov 11 13:46:09 dut kernel: Call Trace: >> Nov 11 13:46:09 dut kernel: <IRQ> [<ffffffff8104fb96>] >> __schedule_bug+0x66/0x70 >> Nov 11 13:46:09 dut kernel: [<ffffffff8149779c>] schedule+0xa2c/0xa60 >> Nov 11 13:46:09 dut kernel: [<ffffffff81497d73>] >> schedule_timeout+0x173/0x2e0 >> Nov 11 13:46:09 dut kernel: [<ffffffff81071200>] ? >> process_timeout+0x0/0x10 >> Nov 11 13:46:09 dut kernel: [<ffffffff81497f3e>] >> schedule_timeout_interruptible+0x1e/0x20 >> Nov 11 13:46:09 dut kernel: [<ffffffff81072b39>] >> msleep_interruptible+0x39/0x50 >> Nov 11 13:46:09 dut kernel: [<ffffffffa033ebfa>] >> transport_generic_handle_data+0x2a/0x80 [target_core_mod] >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c33ee>] >> ft_recv_write_data+0x1fe/0x2b0 [tcm_fc] >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c13cb>] ft_recv_seq+0x8b/0xc0 >> [tcm_fc] >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a0e1f>] >> fc_exch_recv+0x61f/0xe20 [libfc] >> Nov 11 13:46:09 dut kernel: [<ffffffff813c1123>] ? >> skb_copy_bits+0x63/0x2c0 >> Nov 11 13:46:09 dut kernel: [<ffffffff813c15ea>] ? >> __pskb_pull_tail+0x26a/0x360 >> Nov 11 13:46:09 dut kernel: [<ffffffffa015b86d>] >> fcoe_recv_frame+0x18d/0x340 [fcoe] >> Nov 11 13:46:09 dut kernel: [<ffffffff813c13df>] ? >> __pskb_pull_tail+0x5f/0x360 >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ? >> __netdev_alloc_skb+0x24/0x50 >> Nov 11 13:46:09 dut kernel: [<ffffffffa015e52a>] fcoe_rcv+0x2aa/0x44c >> [fcoe] >> Nov 11 13:46:09 dut kernel: [<ffffffff8113c897>] ? >> __kmalloc_node_track_caller+0x67/0xe0 >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ? >> __netdev_alloc_skb+0x24/0x50 >> Nov 11 13:46:09 dut kernel: [<ffffffff813cd39a>] >> __netif_receive_skb+0x41a/0x5d0 >> Nov 11 13:46:09 dut kernel: [<ffffffff81012699>] ? read_tsc+0x9/0x20 >> Nov 11 13:46:09 dut kernel: [<ffffffff813ceab8>] >> netif_receive_skb+0x58/0x80 >> Nov 11 13:46:09 dut kernel: [<ffffffff813cec20>] >> napi_skb_finish+0x50/0x70 >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf1a5>] >> napi_gro_receive+0xc5/0xd0 >> Nov 11 13:46:09 dut kernel: [<ffffffffa0207a1f>] >> ixgbe_clean_rx_irq+0x31f/0x840 [ixgbe] >> Nov 11 13:46:09 dut kernel: [<ffffffffa02083a6>] >> ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe] >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf382>] >> net_rx_action+0x102/0x250 >> Nov 11 13:46:09 dut kernel: [<ffffffff81068af2>] >> __do_softirq+0xb2/0x240 >> Nov 11 13:46:09 dut kernel: [<ffffffff8100c07c>] call_softirq+0x1c/0x30 >> Nov 11 13:46:09 dut kernel: <EOI> [<ffffffff8100db25>] ? >> do_softirq+0x65/0xa0 >> Nov 11 13:46:09 dut kernel: [<ffffffff81068664>] >> local_bh_enable+0x94/0xa0 >> Nov 11 13:46:09 dut kernel: [<ffffffff813cdfd3>] >> dev_queue_xmit+0x143/0x3b0 >> Nov 11 13:46:09 dut kernel: [<ffffffffa015d96e>] fcoe_xmit+0x30e/0x520 >> [fcoe] >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a2a13>] ? >> _fc_frame_alloc+0x33/0x90 [libfc] >> Nov 11 13:46:09 dut kernel: [<ffffffffa039f904>] fc_seq_send+0xb4/0x140 >> [libfc] >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c1722>] >> ft_write_pending+0x112/0x160 [tcm_fc] >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347800>] >> transport_generic_new_cmd+0x280/0x2b0 [target_core_mod] >> Nov 11 13:46:09 dut kernel: [<ffffffffa03479d4>] >> transport_processing_thread+0x1a4/0x7c0 [target_core_mod] >> Nov 11 13:46:09 dut kernel: [<ffffffff810835d0>] ? >> autoremove_wake_function+0x0/0x40 >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347830>] ? >> transport_processing_thread+0x0/0x7c0 [target_core_mod] >> Nov 11 13:46:09 dut kernel: [<ffffffff81082f36>] kthread+0x96/0xa0 >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf84>] >> kernel_thread_helper+0x4/0x10 >> Nov 11 13:46:09 dut kernel: [<ffffffff81082ea0>] ? kthread+0x0/0xa0 >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf80>] ? >> kernel_thread_helper+0x0/0x10 > > I started noticing these issues first when I ran I/O with larger > filesizes (appr. 25GB), but I'm thinking that might be a red herring. > I'll rebuild the kernel and tools to make sure nothing is out of sorts > and will report on any additional findings. > > Thanks, > > Frank FCP data frames are coming in at the interrupt level, and TCM expects to be called in a thread or non-interrupt context, since transport_generic_handle_data() may sleep. A quick workaround would be to change the fast path in fcoe_rcv() so that data always goes through the per-cpu receive threads. That avoids part of the problem, but isn't anything like the right fix. It doesn't seem good to let TCM block FCoE's per-cpu receive thread either. Here's a quick change if you want to just work around the problem. I haven't tested it: diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c index feddb53..8f854cd 100644 --- a/drivers/scsi/fcoe/fcoe.c +++ b/drivers/scsi/fcoe/fcoe.c @@ -1285,6 +1285,7 @@ int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev, * BLOCK softirq context. */ if (fh->fh_type == FC_TYPE_FCP && + 0 && cpu == smp_processor_id() && skb_queue_empty(&fps->fcoe_rx_list)) { spin_unlock_bh(&fps->fcoe_rx_list.lock); --- Cheers, Joe _______________________________________________ devel mailing list [email protected] http://www.open-fcoe.org/mailman/listinfo/devel _______________________________________________ devel mailing list [email protected] http://www.open-fcoe.org/mailman/listinfo/devel
