> 
> From: Justin Cannon [mailto:austinxxh-m...@yahoo.com]
> Sent: Monday, December 19, 2011 3:12 PM
> To: Zou, Yi; Bart Van Assche
> Cc: devel@open-fcoe.org
> Subject: Re: [Open-FCoE] 3.1.4: enabling vn2vn mode triggers circular
> locking
> 
> I could not get p2p (not vn2vn) working using 2.6.34/35/36 with scst/fcst.
> it failed on 'no FCoE interface' after it detects no HBA, in my case I'm
> using Ethernet only, I wonder why it quits after it sees no HBA, the
> ethernet should be its HBA now and the kernel sees it and reports the
> link is up.
Can you just do 'echo ethx > /sys/module/fcoe/parameters/create'? that's
what I remembered worked, the fcoe_transport was added at around 38/39 time
that interfaces for all underlying LLDs to create via fcoe-utils, where
fcoeadm talks to fcoe service and hbalib, you don't necessarily need it
to get create going.

yi

> 
> I'm to try the in-kernel code(RTS code), I got lost immediately by the
> git location, naming changes(lio, tcm, rtsadmin, targetcli, python
> dependencies,etc), not to mention the new storage stack's document is
> very hard to find, if it exists.
> 
> Thanks,
> xxiao
> 
> ________________________________________
> From: "Zou, Yi" <yi....@intel.com>
> To: Justin Cannon <austinxxh-m...@yahoo.com>; Bart Van Assche
> <bvanass...@acm.org>
> Cc: "devel@open-fcoe.org" <devel@open-fcoe.org>
> Sent: Monday, December 19, 2011 4:28 PM
> Subject: RE: [Open-FCoE] 3.1.4: enabling vn2vn mode triggers circular
> locking
> 
> >
> > From: Justin Cannon [mailto:austinxxh-m...@yahoo.com]
> > Sent: Saturday, December 17, 2011 7:06 AM
> > To: Bart Van Assche; Zou, Yi
> > Cc: devel@open-fcoe.org
> > Subject: Re: [Open-FCoE] 3.1.4: enabling vn2vn mode triggers circular
> > locking
> >
> > while vn2vn might be broken, from various googling it does appear
> > SCST/FCST point-to-point somehow worked in the past.
> >
> > for the kernel version vs p2p mode, FCST has patches for kernel up to
> > 2.6.36, those might be back-porting patches so older kernel(e.g.2.6.35)
> > might be supposed to work as well.
> >
> > I'm building 2.6.38 for both initiator and target now, and will see if
> > SCST/FCST work or not soon.
> >
> > Thanks,
> > xxiao
> 
> Yeah, scst worked before, I haven't tried it for a long time...
> 
> 
> >
> > ________________________________________
> > From: Bart Van Assche <bvanass...@acm.org>
> > To: "Zou, Yi" <yi....@intel.com>
> > Cc: "devel@open-fcoe.org" <devel@open-fcoe.org>; Justin Cannon
> > <austinxxh-m...@yahoo.com>
> > Sent: Saturday, December 17, 2011 8:49 AM
> > Subject: Re: [Open-FCoE] 3.1.4: enabling vn2vn mode triggers circular
> > locking
> >
> > On Fri, Dec 9, 2011 at 10:56 PM, Zou, Yi <yi....@intel.com> wrote:
> > > You were in vn2vn mode, I assume? I did not see this in fabric mode,
> > > Since it seems only in vn timeout would hold the ctrl lock and then
> > > do the mac update. I don't think we need rtnl lock to be held for
> > > fcoe_ctlr_link_up() in fcoe_create, don't remember why...but the race
> > > seems still there from device notification call back that comes in
> > > w/ rtnl lock held...
> > >
> > > alternatively, rtnl lock is needed for dev_uc/mc_add/del in fcoe.c,
> > > for that sake, ctrl lock can be dropped for update_mac(), however
> > > logically we should hold ctrl lock instead of rtnl for update_mac(),
> > > but fcoe is using netdev uc/mc updating calls...
> >
> > I'm afraid that there is more that's broken with vn2vn mode than just
> > locking. Shortly after I enabled vn2vn mode on an initiator system a
> > NULL pointer dereference was triggered on the target system running
> > fcst. My interpretation of the call stack below is that this issue is
> > caused by the fcoe transport code and not by fcst. Is that
> > interpretation correct ?
> The bug seems to be the fcst ft_prli_locked() assumes the fc_els_spp
> service
> parameter is always not NULL, your R12 (*rspp) clearly indicates it's not.
> And
> the backtrace confirms that fc_rport_enter_prli() is passing NULL to prli.
> 
> The tcm_fc has the fix that should be ported to fcst for this bug, if you
> Take a look at drivers/target/tcm_fc/tfc_sess.c, basically checks if rspp
> is NULL first.
> 
> I am still trying to get back to the other bug you identified in VN2VN
> mode
> once I get a vn2vn setup. For now, if it's ok, can you try locally patch
> fcst to see if this bug can unblock you?
> 
> thanks,
> yi
> 
> >
> > BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000002
> > IP: [<ffffffffa04b282b>] ft_prli+0x4b/0x350 [fcst]
> > PGD 1a7ba4067 PUD 1a7ba3067 PMD 0
> > Oops: 0000 [#1] SMP
> > CPU 0
> > Modules linked in: netconsole configfs ib_srpt fcst scst_vdisk scst
> > crc32c libcrc32c fcoe libfcoe libfc scsi_transport_fc snd_pcm_oss
> > snd_mixer_oss snd_seq snd_seq_device af_packet rdma_ucm rdma_cm iw_cm
> > ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core
> > microcode cpufreq_conservative cpufreq_userspace cpufreq_powersave
> > acpi_cpufreq mperf dm_mod snd_hda_codec_hdmi snd_hda_codec_analog
> > snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd intel_agp
> > mlx4_core sr_mod sg intel_gtt cdrom soundcore i2c_i801 agpgart
> > snd_page_alloc i2c_core pcspkr button uhci_hcd sd_mod crc_t10dif
> > ehci_hcd usbcore edd ext3 mbcache jbd fan ata_generic ata_piix
> > pata_marvell ahci libahci libata scsi_mod thermal processor
> > thermal_sys hwmon [last unloaded: scst]
> >
> > Pid: 3562, comm: fcoethread/0 Not tainted 3.1.5-debug+ #1 System
> > manufacturer P5Q DELUXE/P5Q DELUXE
> > RIP: 0010:[<ffffffffa04b282b>]  [<ffffffffa04b282b>] ft_prli+0x4b/0x350
> > [fcst]
> > RSP: 0018:ffff8801a6273b70  EFLAGS: 00010282
> > RAX: ffff8801a6273fd8 RBX: 0000000000000000 RCX: 0000000000000006
> > RDX: 0000000000000001 RSI: 2222222222222222 RDI: 2222222222222222
> > RBP: ffff8801a6273be0 R08: 2222222222222222 R09: 2222222222222222
> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
> > R13: ffff8801a6291c7c R14: ffff8801a6290800 R15: ffff8801a6290848
> > FS:  0000000000000000(0000) GS:ffff8801bfc00000(0000)
> > knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000000000000002 CR3: 00000001a7ba1000 CR4: 00000000000406f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process fcoethread/0 (pid: 3562, threadinfo ffff8801a6272000, task
> > ffff8801b13ecce0)/
> > Stack:
> > ffff8801a6273bc0 ffffffff812f7593 00000000000000c0 ffff8801b9002a00
> > 0000000000000100 000000000000002c 0000000000000000 ffff8801ae1a4a18
> > ffff8801b1a1d600 ffff8801a6290800 ffff8801b1a1ce00 ffff8801ae1a4a18
> > Call Trace:
> > [<ffffffff812f7593>] ? __alloc_skb+0x83/0x170
> > [<ffffffffa03b95ec>] fc_rport_enter_prli+0xec/0x220 [libfc]
> > [<ffffffffa03ba531>] fc_rport_recv_req+0x541/0x1280 [libfc]
> > [<ffffffff81082e8d>] ? trace_hardirqs_on_caller+0x11d/0x1b0
> > [<ffffffff813d44ad>] ? mutex_lock_nested+0x26d/0x330
> > [<ffffffffa03b6c00>] ? fc_lport_recv_els_req+0x30/0x140 [libfc]
> > [<ffffffffa03b6c1f>] fc_lport_recv_els_req+0x4f/0x140 [libfc]
> > [<ffffffffa03b5f14>] fc_lport_recv_req+0x174/0x230 [libfc]
> > [<ffffffffa03b5dd1>] ? fc_lport_recv_req+0x31/0x230 [libfc]
> > [<ffffffff81082f2d>] ? trace_hardirqs_on+0xd/0x10
> > [<ffffffffa03b2f6c>] fc_exch_recv+0x63c/0xe50 [libfc]
> > [<ffffffffa03ce3b8>] fcoe_recv_frame+0x1d8/0x410 [fcoe]
> > [<ffffffff81082e8d>] ? trace_hardirqs_on_caller+0x11d/0x1b0
> > [<ffffffffa03ceaf8>] ? fcoe_percpu_receive_thread+0x68/0xf0 [fcoe]
> > [<ffffffff8104caf7>] ? local_bh_enable_ip+0x87/0xf0
> > [<ffffffffa03ceb00>] fcoe_percpu_receive_thread+0x70/0xf0 [fcoe]
> > [<ffffffffa03cea90>] ? fcoe_rcv+0x450/0x450 [fcoe]
> > [<ffffffff81069656>] kthread+0x96/0xa0
> > [<ffffffff813e0b74>] kernel_thread_helper+0x4/0x10
> > [<ffffffff813d6d9d>] ? retint_restore_args+0xe/0xe
> > [<ffffffff810695c0>] ? __init_kthread_worker+0x70/0x70
> > [<ffffffff813e0b70>] ? gs_change+0xb/0xb
> > Code: 90 f6 05 e8 2c 00 00 02 49 89 fe 48 89 d3 49 89 cd 0f 85 8e 02
> > 00 00 31 f6 48 c7 c7 00 50 4b a0 41 bc 04 00 00 00 e8 15 1a f2 e0 <0f>
> > b6 43 02 a8 c0 75 65 8b 4b 0c 41 b4 08 0f c9 f6 c1 30 74 58
> > RIP  [<ffffffffa04b282b>] ft_prli+0x4b/0x350 [fcst]
> > RSP <ffff8801a6273b70>
> > CR2: 0000000000000002
> > ---[ end trace c06e7c64e9c18831 ]---
> >
> > The above call stack was obtained after having applied the following
> > patch on initiator and target (on top of Linux kernel v3.1.5):
> >
> > NOTE: THIS PATCH INTRODUCES A RACE CONDITION !
> >
> > ---
> > drivers/scsi/fcoe/fcoe.c      |  12 ++++++++----
> > drivers/scsi/fcoe/fcoe_ctlr.c |    2 ++
> > 2 files changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
> > index 5d0e9a2..2648454 100644
> > --- a/drivers/scsi/fcoe/fcoe.c
> > +++ b/drivers/scsi/fcoe/fcoe.c
> > @@ -1986,7 +1986,7 @@ static bool fcoe_match(struct net_device *netdev)
> >  */
> > static int fcoe_create(struct net_device *netdev, enum fip_state
> fip_mode)
> > {
> > -    int rc = 0;
> > +    int rc = 0, link_status;
> >    struct fcoe_interface *fcoe;
> >    struct fc_lport *lport;
> >
> > @@ -2024,14 +2024,18 @@ static int fcoe_create(struct net_device
> > *netdev, enum fip_state fip_mode)
> >    /* start FIP Discovery and FLOGI */
> >    lport->boot_time = jiffies;
> >    fc_fabric_login(lport);
> > -    if (!fcoe_link_ok(lport))
> > +    link_status = fcoe_link_ok(lport);
> > +    rtnl_unlock();
> > +    if (link_status == 0)
> >        fcoe_ctlr_link_up(&fcoe->ctlr);
> >
> > -out_nodev:
> > -    rtnl_unlock();
> > out_nortnl:
> >    mutex_unlock(&fcoe_config_mutex);
> >    return rc;
> > +
> > +out_nodev:
> > +    rtnl_unlock();
> > +    goto out_nortnl;
> > }
> >
> > /**
> > diff --git a/drivers/scsi/fcoe/fcoe_ctlr.c
> > b/drivers/scsi/fcoe/fcoe_ctlr.c
> > index c74c4b8..e6301af 100644
> > --- a/drivers/scsi/fcoe/fcoe_ctlr.c
> > +++ b/drivers/scsi/fcoe/fcoe_ctlr.c
> > @@ -2642,7 +2642,9 @@ static void fcoe_ctlr_vn_timeout(struct fcoe_ctlr
> > *fip)
> >        hton24(mac, FIP_VN_FC_MAP);
> >        hton24(mac + 3, new_port_id);
> >        fcoe_ctlr_map_dest(fip);
> > +        mutex_unlock(&fip->ctlr_mutex);
> >        fip->update_mac(fip->lp, mac);
> > +        mutex_lock(&fip->ctlr_mutex);
> >        fcoe_ctlr_vn_send_claim(fip);
> >        next_time = jiffies + msecs_to_jiffies(FIP_VN_ANN_WAIT);
> >        break;
> > --
> > 1.7.3.4
> 

_______________________________________________
devel mailing list
devel@open-fcoe.org
https://lists.open-fcoe.org/mailman/listinfo/devel

Reply via email to