Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic

Jansen, Frank Fri, 12 Nov 2010 05:54:39 -0800

Joe,

Just want to give you a quick heads-up that I am up and running with the change 
you suggested.


Thanks,

Frank

> -----Original Message-----
> From: Joe Eykholt [mailto:[email protected]]
> Sent: Thursday, November 11, 2010 2:52 PM
> To: Jansen, Frank
> Cc: [email protected]
> Subject: Re: [Open-FCoE] transport_generic_handle_data - BUG:
> scheduling while atomic
> 
> 
> 
> On 11/11/10 11:41 AM, Jansen, Frank wrote:
> > Greetings!
> >
> > I'm running 2.6.36 with Kiran Patil's patches from 10/28/10.
> >
> > I have 4 logical volumes configured over fcoe:
> >
> > [r...@dut ~]# tcm_node --listhbas
> > \------> iblock_0
> >        HBA Index: 1 plugin: iblock version: v4.0.0-rc5
> >        \-------> r0_lun3
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-4  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l3
> >        Major: 253 Minor: 4  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l3
> >        \-------> r0_lun2
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-3  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l2
> >        Major: 253 Minor: 3  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l2
> >        \-------> r0_lun1
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-2  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l1
> >        Major: 253 Minor: 2  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l1
> >        \-------> r0_lun0
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-1  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l0
> >        Major: 253 Minor: 1  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l0
> >
> > When any significant I/O load is put on any of the devices, I receive
> > a flood of the following messages:
> >
> >> Nov 11 13:46:09 dut kernel: BUG: scheduling while atomic:
> >> LIO_iblock/4439/0x00000101
> >> Nov 11 13:46:09 dut kernel: Modules linked in: fcoe libfcoe
> >> target_core_stgt target_core_pscsi target_core_file
> target_core_iblock
> >> ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 tcm_fc
> libfc
> >> scsi_transport_fc scsi_tgt target_core_mod configfs sunrpc ipv6
> >> dm_mirror dm_region_hash dm_log kvm_intel kvm uinput ixgbe ioatdma
> >> iTCO_wdt ses enclosure i2c_i801 i2c_core iTCO_vendor_support mdio sg
> >> igb dca pcspkr evbug evdev ext4 mbcache jbd2 sd_mod crc_t10dif
> >> pata_acpi ata_generic mpt2sas scsi_transport_sas ata_piix raid_class
> >> dm_mod [last unloaded: speedstep_lib]
> >> Nov 11 13:46:09 dut kernel: Pid: 4439, comm: LIO_iblock Not tainted
> >> 2.6.36+ #1
> >> Nov 11 13:46:09 dut kernel: Call Trace:
> >> Nov 11 13:46:09 dut kernel: <IRQ>  [<ffffffff8104fb96>]
> >> __schedule_bug+0x66/0x70
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8149779c>]
> schedule+0xa2c/0xa60
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81497d73>]
> >> schedule_timeout+0x173/0x2e0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81071200>] ?
> >> process_timeout+0x0/0x10
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81497f3e>]
> >> schedule_timeout_interruptible+0x1e/0x20
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81072b39>]
> >> msleep_interruptible+0x39/0x50
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa033ebfa>]
> >> transport_generic_handle_data+0x2a/0x80 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c33ee>]
> >> ft_recv_write_data+0x1fe/0x2b0 [tcm_fc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c13cb>]
> ft_recv_seq+0x8b/0xc0
> >> [tcm_fc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a0e1f>]
> >> fc_exch_recv+0x61f/0xe20 [libfc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c1123>] ?
> >> skb_copy_bits+0x63/0x2c0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c15ea>] ?
> >> __pskb_pull_tail+0x26a/0x360
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa015b86d>]
> >> fcoe_recv_frame+0x18d/0x340 [fcoe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c13df>] ?
> >> __pskb_pull_tail+0x5f/0x360
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
> >> __netdev_alloc_skb+0x24/0x50
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa015e52a>]
> fcoe_rcv+0x2aa/0x44c
> >> [fcoe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8113c897>] ?
> >> __kmalloc_node_track_caller+0x67/0xe0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
> >> __netdev_alloc_skb+0x24/0x50
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cd39a>]
> >> __netif_receive_skb+0x41a/0x5d0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81012699>] ? read_tsc+0x9/0x20
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813ceab8>]
> >> netif_receive_skb+0x58/0x80
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cec20>]
> >> napi_skb_finish+0x50/0x70
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf1a5>]
> >> napi_gro_receive+0xc5/0xd0
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa0207a1f>]
> >> ixgbe_clean_rx_irq+0x31f/0x840 [ixgbe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa02083a6>]
> >> ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf382>]
> >> net_rx_action+0x102/0x250
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81068af2>]
> >> __do_softirq+0xb2/0x240
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8100c07c>]
> call_softirq+0x1c/0x30
> >> Nov 11 13:46:09 dut kernel: <EOI>  [<ffffffff8100db25>] ?
> >> do_softirq+0x65/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81068664>]
> >> local_bh_enable+0x94/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cdfd3>]
> >> dev_queue_xmit+0x143/0x3b0
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa015d96e>]
> fcoe_xmit+0x30e/0x520
> >> [fcoe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a2a13>] ?
> >> _fc_frame_alloc+0x33/0x90 [libfc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa039f904>]
> fc_seq_send+0xb4/0x140
> >> [libfc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c1722>]
> >> ft_write_pending+0x112/0x160 [tcm_fc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347800>]
> >> transport_generic_new_cmd+0x280/0x2b0 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03479d4>]
> >> transport_processing_thread+0x1a4/0x7c0 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff810835d0>] ?
> >> autoremove_wake_function+0x0/0x40
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347830>] ?
> >> transport_processing_thread+0x0/0x7c0 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81082f36>] kthread+0x96/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf84>]
> >> kernel_thread_helper+0x4/0x10
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81082ea0>] ? kthread+0x0/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf80>] ?
> >> kernel_thread_helper+0x0/0x10
> >
> > I started noticing these issues first when I ran I/O with larger
> > filesizes (appr. 25GB), but I'm thinking that might be a red herring.
> > I'll rebuild the kernel and tools to make sure nothing is out of
> sorts
> > and will report on any additional findings.
> >
> > Thanks,
> >
> > Frank
> 
> FCP data frames are coming in at the interrupt level, and TCM expects
> to be called in a thread or non-interrupt context, since
> transport_generic_handle_data() may sleep.
> 
> A quick workaround would be to change the fast path in fcoe_rcv() so
> that
> data always goes through the per-cpu receive threads.   That avoids
> part of the
> problem, but isn't anything like the right fix.  It doesn't seem good
> to
> let TCM block FCoE's per-cpu receive thread either.
> 
> Here's a quick change if you want to just work around the problem.
> I haven't tested it:
> 
> diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
> index feddb53..8f854cd 100644
> --- a/drivers/scsi/fcoe/fcoe.c
> +++ b/drivers/scsi/fcoe/fcoe.c
> @@ -1285,6 +1285,7 @@ int fcoe_rcv(struct sk_buff *skb, struct
> net_device *netdev,
>        * BLOCK softirq context.
>        */
>       if (fh->fh_type == FC_TYPE_FCP &&
> +         0 &&
>           cpu == smp_processor_id() &&
>           skb_queue_empty(&fps->fcoe_rx_list)) {
>               spin_unlock_bh(&fps->fcoe_rx_list.lock);
> 
> ---
> 
>       Cheers,
>       Joe
> 
> 
> 

_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic

Reply via email to