Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic

Joe Eykholt Fri, 12 Nov 2010 09:31:38 -0800

On 11/12/10 5:53 AM, Jansen, Frank wrote:
> Joe,
> 
> Just want to give you a quick heads-up that I am up and running with the 
> change you suggested.


Glad to hear it.  I was trying to think of a better way of maintaining the fast 
FCP
path for initiators and skipping it for targets. Maybe testing the F_CTL EX_CTX 
bit
would work.   For targets it'll be 0 (on FCP exchanges), for initiators 1.

     Cheers,
     Joe

>> -----Original Message-----
>> From: Joe Eykholt [mailto:[email protected]]
>> Sent: Thursday, November 11, 2010 2:52 PM
>> To: Jansen, Frank
>> Cc: [email protected]
>> Subject: Re: [Open-FCoE] transport_generic_handle_data - BUG:
>> scheduling while atomic
>>
>>
>>
>> On 11/11/10 11:41 AM, Jansen, Frank wrote:
>>> Greetings!
>>>
>>> I'm running 2.6.36 with Kiran Patil's patches from 10/28/10.
>>>
>>> I have 4 logical volumes configured over fcoe:
>>>
>>> [r...@dut ~]# tcm_node --listhbas
>>> \------> iblock_0
>>>        HBA Index: 1 plugin: iblock version: v4.0.0-rc5
>>>        \-------> r0_lun3
>>>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
>>> SectorSize: 512  MaxSectors: 1024
>>>        iBlock device: dm-4  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l3
>>>        Major: 253 Minor: 4  CLAIMED: IBLOCK
>>>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l3
>>>        \-------> r0_lun2
>>>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
>>> SectorSize: 512  MaxSectors: 1024
>>>        iBlock device: dm-3  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l2
>>>        Major: 253 Minor: 3  CLAIMED: IBLOCK
>>>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l2
>>>        \-------> r0_lun1
>>>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
>>> SectorSize: 512  MaxSectors: 1024
>>>        iBlock device: dm-2  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l1
>>>        Major: 253 Minor: 2  CLAIMED: IBLOCK
>>>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l1
>>>        \-------> r0_lun0
>>>        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
>>> SectorSize: 512  MaxSectors: 1024
>>>        iBlock device: dm-1  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l0
>>>        Major: 253 Minor: 1  CLAIMED: IBLOCK
>>>        udev_path: /dev/vg_R0_p1/lv_R0_p1_l0
>>>
>>> When any significant I/O load is put on any of the devices, I receive
>>> a flood of the following messages:
>>>
>>>> Nov 11 13:46:09 dut kernel: BUG: scheduling while atomic:
>>>> LIO_iblock/4439/0x00000101
>>>> Nov 11 13:46:09 dut kernel: Modules linked in: fcoe libfcoe
>>>> target_core_stgt target_core_pscsi target_core_file
>> target_core_iblock
>>>> ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 tcm_fc
>> libfc
>>>> scsi_transport_fc scsi_tgt target_core_mod configfs sunrpc ipv6
>>>> dm_mirror dm_region_hash dm_log kvm_intel kvm uinput ixgbe ioatdma
>>>> iTCO_wdt ses enclosure i2c_i801 i2c_core iTCO_vendor_support mdio sg
>>>> igb dca pcspkr evbug evdev ext4 mbcache jbd2 sd_mod crc_t10dif
>>>> pata_acpi ata_generic mpt2sas scsi_transport_sas ata_piix raid_class
>>>> dm_mod [last unloaded: speedstep_lib]
>>>> Nov 11 13:46:09 dut kernel: Pid: 4439, comm: LIO_iblock Not tainted
>>>> 2.6.36+ #1
>>>> Nov 11 13:46:09 dut kernel: Call Trace:
>>>> Nov 11 13:46:09 dut kernel: <IRQ>  [<ffffffff8104fb96>]
>>>> __schedule_bug+0x66/0x70
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff8149779c>]
>> schedule+0xa2c/0xa60
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81497d73>]
>>>> schedule_timeout+0x173/0x2e0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81071200>] ?
>>>> process_timeout+0x0/0x10
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81497f3e>]
>>>> schedule_timeout_interruptible+0x1e/0x20
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81072b39>]
>>>> msleep_interruptible+0x39/0x50
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa033ebfa>]
>>>> transport_generic_handle_data+0x2a/0x80 [target_core_mod]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c33ee>]
>>>> ft_recv_write_data+0x1fe/0x2b0 [tcm_fc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c13cb>]
>> ft_recv_seq+0x8b/0xc0
>>>> [tcm_fc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03a0e1f>]
>>>> fc_exch_recv+0x61f/0xe20 [libfc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c1123>] ?
>>>> skb_copy_bits+0x63/0x2c0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c15ea>] ?
>>>> __pskb_pull_tail+0x26a/0x360
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa015b86d>]
>>>> fcoe_recv_frame+0x18d/0x340 [fcoe]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c13df>] ?
>>>> __pskb_pull_tail+0x5f/0x360
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
>>>> __netdev_alloc_skb+0x24/0x50
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa015e52a>]
>> fcoe_rcv+0x2aa/0x44c
>>>> [fcoe]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff8113c897>] ?
>>>> __kmalloc_node_track_caller+0x67/0xe0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
>>>> __netdev_alloc_skb+0x24/0x50
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cd39a>]
>>>> __netif_receive_skb+0x41a/0x5d0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81012699>] ? read_tsc+0x9/0x20
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813ceab8>]
>>>> netif_receive_skb+0x58/0x80
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cec20>]
>>>> napi_skb_finish+0x50/0x70
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cf1a5>]
>>>> napi_gro_receive+0xc5/0xd0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa0207a1f>]
>>>> ixgbe_clean_rx_irq+0x31f/0x840 [ixgbe]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa02083a6>]
>>>> ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cf382>]
>>>> net_rx_action+0x102/0x250
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81068af2>]
>>>> __do_softirq+0xb2/0x240
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff8100c07c>]
>> call_softirq+0x1c/0x30
>>>> Nov 11 13:46:09 dut kernel: <EOI>  [<ffffffff8100db25>] ?
>>>> do_softirq+0x65/0xa0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81068664>]
>>>> local_bh_enable+0x94/0xa0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff813cdfd3>]
>>>> dev_queue_xmit+0x143/0x3b0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa015d96e>]
>> fcoe_xmit+0x30e/0x520
>>>> [fcoe]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03a2a13>] ?
>>>> _fc_frame_alloc+0x33/0x90 [libfc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa039f904>]
>> fc_seq_send+0xb4/0x140
>>>> [libfc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03c1722>]
>>>> ft_write_pending+0x112/0x160 [tcm_fc]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa0347800>]
>>>> transport_generic_new_cmd+0x280/0x2b0 [target_core_mod]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa03479d4>]
>>>> transport_processing_thread+0x1a4/0x7c0 [target_core_mod]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff810835d0>] ?
>>>> autoremove_wake_function+0x0/0x40
>>>> Nov 11 13:46:09 dut kernel: [<ffffffffa0347830>] ?
>>>> transport_processing_thread+0x0/0x7c0 [target_core_mod]
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81082f36>] kthread+0x96/0xa0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf84>]
>>>> kernel_thread_helper+0x4/0x10
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff81082ea0>] ? kthread+0x0/0xa0
>>>> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf80>] ?
>>>> kernel_thread_helper+0x0/0x10
>>>
>>> I started noticing these issues first when I ran I/O with larger
>>> filesizes (appr. 25GB), but I'm thinking that might be a red herring.
>>> I'll rebuild the kernel and tools to make sure nothing is out of
>> sorts
>>> and will report on any additional findings.
>>>
>>> Thanks,
>>>
>>> Frank
>>
>> FCP data frames are coming in at the interrupt level, and TCM expects
>> to be called in a thread or non-interrupt context, since
>> transport_generic_handle_data() may sleep.
>>
>> A quick workaround would be to change the fast path in fcoe_rcv() so
>> that
>> data always goes through the per-cpu receive threads.   That avoids
>> part of the
>> problem, but isn't anything like the right fix.  It doesn't seem good
>> to
>> let TCM block FCoE's per-cpu receive thread either.
>>
>> Here's a quick change if you want to just work around the problem.
>> I haven't tested it:
>>
>> diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
>> index feddb53..8f854cd 100644
>> --- a/drivers/scsi/fcoe/fcoe.c
>> +++ b/drivers/scsi/fcoe/fcoe.c
>> @@ -1285,6 +1285,7 @@ int fcoe_rcv(struct sk_buff *skb, struct
>> net_device *netdev,
>>       * BLOCK softirq context.
>>       */
>>      if (fh->fh_type == FC_TYPE_FCP &&
>> +        0 &&
>>          cpu == smp_processor_id() &&
>>          skb_queue_empty(&fps->fcoe_rx_list)) {
>>              spin_unlock_bh(&fps->fcoe_rx_list.lock);
>>
>> ---
>>
>>      Cheers,
>>      Joe
>>
>>
>>
> 
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic

Reply via email to