Joe, Just want to give you a quick heads-up that I am up and running with the change you suggested.
Thanks, Frank > -----Original Message----- > From: Joe Eykholt [mailto:[email protected]] > Sent: Thursday, November 11, 2010 2:52 PM > To: Jansen, Frank > Cc: [email protected] > Subject: Re: [Open-FCoE] transport_generic_handle_data - BUG: > scheduling while atomic > > > > On 11/11/10 11:41 AM, Jansen, Frank wrote: > > Greetings! > > > > I'm running 2.6.36 with Kiran Patil's patches from 10/28/10. > > > > I have 4 logical volumes configured over fcoe: > > > > [r...@dut ~]# tcm_node --listhbas > > \------> iblock_0 > > HBA Index: 1 plugin: iblock version: v4.0.0-rc5 > > \-------> r0_lun3 > > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > > SectorSize: 512 MaxSectors: 1024 > > iBlock device: dm-4 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l3 > > Major: 253 Minor: 4 CLAIMED: IBLOCK > > udev_path: /dev/vg_R0_p1/lv_R0_p1_l3 > > \-------> r0_lun2 > > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > > SectorSize: 512 MaxSectors: 1024 > > iBlock device: dm-3 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l2 > > Major: 253 Minor: 3 CLAIMED: IBLOCK > > udev_path: /dev/vg_R0_p1/lv_R0_p1_l2 > > \-------> r0_lun1 > > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > > SectorSize: 512 MaxSectors: 1024 > > iBlock device: dm-2 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l1 > > Major: 253 Minor: 2 CLAIMED: IBLOCK > > udev_path: /dev/vg_R0_p1/lv_R0_p1_l1 > > \-------> r0_lun0 > > Status: ACTIVATED Execute/Left/Max Queue Depth: 0/32/32 > > SectorSize: 512 MaxSectors: 1024 > > iBlock device: dm-1 UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l0 > > Major: 253 Minor: 1 CLAIMED: IBLOCK > > udev_path: /dev/vg_R0_p1/lv_R0_p1_l0 > > > > When any significant I/O load is put on any of the devices, I receive > > a flood of the following messages: > > > >> Nov 11 13:46:09 dut kernel: BUG: scheduling while atomic: > >> LIO_iblock/4439/0x00000101 > >> Nov 11 13:46:09 dut kernel: Modules linked in: fcoe libfcoe > >> target_core_stgt target_core_pscsi target_core_file > target_core_iblock > >> ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 tcm_fc > libfc > >> scsi_transport_fc scsi_tgt target_core_mod configfs sunrpc ipv6 > >> dm_mirror dm_region_hash dm_log kvm_intel kvm uinput ixgbe ioatdma > >> iTCO_wdt ses enclosure i2c_i801 i2c_core iTCO_vendor_support mdio sg > >> igb dca pcspkr evbug evdev ext4 mbcache jbd2 sd_mod crc_t10dif > >> pata_acpi ata_generic mpt2sas scsi_transport_sas ata_piix raid_class > >> dm_mod [last unloaded: speedstep_lib] > >> Nov 11 13:46:09 dut kernel: Pid: 4439, comm: LIO_iblock Not tainted > >> 2.6.36+ #1 > >> Nov 11 13:46:09 dut kernel: Call Trace: > >> Nov 11 13:46:09 dut kernel: <IRQ> [<ffffffff8104fb96>] > >> __schedule_bug+0x66/0x70 > >> Nov 11 13:46:09 dut kernel: [<ffffffff8149779c>] > schedule+0xa2c/0xa60 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81497d73>] > >> schedule_timeout+0x173/0x2e0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81071200>] ? > >> process_timeout+0x0/0x10 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81497f3e>] > >> schedule_timeout_interruptible+0x1e/0x20 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81072b39>] > >> msleep_interruptible+0x39/0x50 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa033ebfa>] > >> transport_generic_handle_data+0x2a/0x80 [target_core_mod] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c33ee>] > >> ft_recv_write_data+0x1fe/0x2b0 [tcm_fc] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c13cb>] > ft_recv_seq+0x8b/0xc0 > >> [tcm_fc] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a0e1f>] > >> fc_exch_recv+0x61f/0xe20 [libfc] > >> Nov 11 13:46:09 dut kernel: [<ffffffff813c1123>] ? > >> skb_copy_bits+0x63/0x2c0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813c15ea>] ? > >> __pskb_pull_tail+0x26a/0x360 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa015b86d>] > >> fcoe_recv_frame+0x18d/0x340 [fcoe] > >> Nov 11 13:46:09 dut kernel: [<ffffffff813c13df>] ? > >> __pskb_pull_tail+0x5f/0x360 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ? > >> __netdev_alloc_skb+0x24/0x50 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa015e52a>] > fcoe_rcv+0x2aa/0x44c > >> [fcoe] > >> Nov 11 13:46:09 dut kernel: [<ffffffff8113c897>] ? > >> __kmalloc_node_track_caller+0x67/0xe0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ? > >> __netdev_alloc_skb+0x24/0x50 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813cd39a>] > >> __netif_receive_skb+0x41a/0x5d0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81012699>] ? read_tsc+0x9/0x20 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813ceab8>] > >> netif_receive_skb+0x58/0x80 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813cec20>] > >> napi_skb_finish+0x50/0x70 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf1a5>] > >> napi_gro_receive+0xc5/0xd0 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa0207a1f>] > >> ixgbe_clean_rx_irq+0x31f/0x840 [ixgbe] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa02083a6>] > >> ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe] > >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf382>] > >> net_rx_action+0x102/0x250 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81068af2>] > >> __do_softirq+0xb2/0x240 > >> Nov 11 13:46:09 dut kernel: [<ffffffff8100c07c>] > call_softirq+0x1c/0x30 > >> Nov 11 13:46:09 dut kernel: <EOI> [<ffffffff8100db25>] ? > >> do_softirq+0x65/0xa0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81068664>] > >> local_bh_enable+0x94/0xa0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff813cdfd3>] > >> dev_queue_xmit+0x143/0x3b0 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa015d96e>] > fcoe_xmit+0x30e/0x520 > >> [fcoe] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a2a13>] ? > >> _fc_frame_alloc+0x33/0x90 [libfc] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa039f904>] > fc_seq_send+0xb4/0x140 > >> [libfc] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c1722>] > >> ft_write_pending+0x112/0x160 [tcm_fc] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347800>] > >> transport_generic_new_cmd+0x280/0x2b0 [target_core_mod] > >> Nov 11 13:46:09 dut kernel: [<ffffffffa03479d4>] > >> transport_processing_thread+0x1a4/0x7c0 [target_core_mod] > >> Nov 11 13:46:09 dut kernel: [<ffffffff810835d0>] ? > >> autoremove_wake_function+0x0/0x40 > >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347830>] ? > >> transport_processing_thread+0x0/0x7c0 [target_core_mod] > >> Nov 11 13:46:09 dut kernel: [<ffffffff81082f36>] kthread+0x96/0xa0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf84>] > >> kernel_thread_helper+0x4/0x10 > >> Nov 11 13:46:09 dut kernel: [<ffffffff81082ea0>] ? kthread+0x0/0xa0 > >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf80>] ? > >> kernel_thread_helper+0x0/0x10 > > > > I started noticing these issues first when I ran I/O with larger > > filesizes (appr. 25GB), but I'm thinking that might be a red herring. > > I'll rebuild the kernel and tools to make sure nothing is out of > sorts > > and will report on any additional findings. > > > > Thanks, > > > > Frank > > FCP data frames are coming in at the interrupt level, and TCM expects > to be called in a thread or non-interrupt context, since > transport_generic_handle_data() may sleep. > > A quick workaround would be to change the fast path in fcoe_rcv() so > that > data always goes through the per-cpu receive threads. That avoids > part of the > problem, but isn't anything like the right fix. It doesn't seem good > to > let TCM block FCoE's per-cpu receive thread either. > > Here's a quick change if you want to just work around the problem. > I haven't tested it: > > diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c > index feddb53..8f854cd 100644 > --- a/drivers/scsi/fcoe/fcoe.c > +++ b/drivers/scsi/fcoe/fcoe.c > @@ -1285,6 +1285,7 @@ int fcoe_rcv(struct sk_buff *skb, struct > net_device *netdev, > * BLOCK softirq context. > */ > if (fh->fh_type == FC_TYPE_FCP && > + 0 && > cpu == smp_processor_id() && > skb_queue_empty(&fps->fcoe_rx_list)) { > spin_unlock_bh(&fps->fcoe_rx_list.lock); > > --- > > Cheers, > Joe > > > _______________________________________________ devel mailing list [email protected] http://www.open-fcoe.org/mailman/listinfo/devel
