> > From: Justin Cannon [mailto:austinxxh-m...@yahoo.com] > Sent: Monday, December 19, 2011 3:12 PM > To: Zou, Yi; Bart Van Assche > Cc: devel@open-fcoe.org > Subject: Re: [Open-FCoE] 3.1.4: enabling vn2vn mode triggers circular > locking > > I could not get p2p (not vn2vn) working using 2.6.34/35/36 with scst/fcst. > it failed on 'no FCoE interface' after it detects no HBA, in my case I'm > using Ethernet only, I wonder why it quits after it sees no HBA, the > ethernet should be its HBA now and the kernel sees it and reports the > link is up. Can you just do 'echo ethx > /sys/module/fcoe/parameters/create'? that's what I remembered worked, the fcoe_transport was added at around 38/39 time that interfaces for all underlying LLDs to create via fcoe-utils, where fcoeadm talks to fcoe service and hbalib, you don't necessarily need it to get create going.
yi > > I'm to try the in-kernel code(RTS code), I got lost immediately by the > git location, naming changes(lio, tcm, rtsadmin, targetcli, python > dependencies,etc), not to mention the new storage stack's document is > very hard to find, if it exists. > > Thanks, > xxiao > > ________________________________________ > From: "Zou, Yi" <yi....@intel.com> > To: Justin Cannon <austinxxh-m...@yahoo.com>; Bart Van Assche > <bvanass...@acm.org> > Cc: "devel@open-fcoe.org" <devel@open-fcoe.org> > Sent: Monday, December 19, 2011 4:28 PM > Subject: RE: [Open-FCoE] 3.1.4: enabling vn2vn mode triggers circular > locking > > > > > From: Justin Cannon [mailto:austinxxh-m...@yahoo.com] > > Sent: Saturday, December 17, 2011 7:06 AM > > To: Bart Van Assche; Zou, Yi > > Cc: devel@open-fcoe.org > > Subject: Re: [Open-FCoE] 3.1.4: enabling vn2vn mode triggers circular > > locking > > > > while vn2vn might be broken, from various googling it does appear > > SCST/FCST point-to-point somehow worked in the past. > > > > for the kernel version vs p2p mode, FCST has patches for kernel up to > > 2.6.36, those might be back-porting patches so older kernel(e.g.2.6.35) > > might be supposed to work as well. > > > > I'm building 2.6.38 for both initiator and target now, and will see if > > SCST/FCST work or not soon. > > > > Thanks, > > xxiao > > Yeah, scst worked before, I haven't tried it for a long time... > > > > > > ________________________________________ > > From: Bart Van Assche <bvanass...@acm.org> > > To: "Zou, Yi" <yi....@intel.com> > > Cc: "devel@open-fcoe.org" <devel@open-fcoe.org>; Justin Cannon > > <austinxxh-m...@yahoo.com> > > Sent: Saturday, December 17, 2011 8:49 AM > > Subject: Re: [Open-FCoE] 3.1.4: enabling vn2vn mode triggers circular > > locking > > > > On Fri, Dec 9, 2011 at 10:56 PM, Zou, Yi <yi....@intel.com> wrote: > > > You were in vn2vn mode, I assume? I did not see this in fabric mode, > > > Since it seems only in vn timeout would hold the ctrl lock and then > > > do the mac update. I don't think we need rtnl lock to be held for > > > fcoe_ctlr_link_up() in fcoe_create, don't remember why...but the race > > > seems still there from device notification call back that comes in > > > w/ rtnl lock held... > > > > > > alternatively, rtnl lock is needed for dev_uc/mc_add/del in fcoe.c, > > > for that sake, ctrl lock can be dropped for update_mac(), however > > > logically we should hold ctrl lock instead of rtnl for update_mac(), > > > but fcoe is using netdev uc/mc updating calls... > > > > I'm afraid that there is more that's broken with vn2vn mode than just > > locking. Shortly after I enabled vn2vn mode on an initiator system a > > NULL pointer dereference was triggered on the target system running > > fcst. My interpretation of the call stack below is that this issue is > > caused by the fcoe transport code and not by fcst. Is that > > interpretation correct ? > The bug seems to be the fcst ft_prli_locked() assumes the fc_els_spp > service > parameter is always not NULL, your R12 (*rspp) clearly indicates it's not. > And > the backtrace confirms that fc_rport_enter_prli() is passing NULL to prli. > > The tcm_fc has the fix that should be ported to fcst for this bug, if you > Take a look at drivers/target/tcm_fc/tfc_sess.c, basically checks if rspp > is NULL first. > > I am still trying to get back to the other bug you identified in VN2VN > mode > once I get a vn2vn setup. For now, if it's ok, can you try locally patch > fcst to see if this bug can unblock you? > > thanks, > yi > > > > > BUG: unable to handle kernel NULL pointer dereference at > 0000000000000002 > > IP: [<ffffffffa04b282b>] ft_prli+0x4b/0x350 [fcst] > > PGD 1a7ba4067 PUD 1a7ba3067 PMD 0 > > Oops: 0000 [#1] SMP > > CPU 0 > > Modules linked in: netconsole configfs ib_srpt fcst scst_vdisk scst > > crc32c libcrc32c fcoe libfcoe libfc scsi_transport_fc snd_pcm_oss > > snd_mixer_oss snd_seq snd_seq_device af_packet rdma_ucm rdma_cm iw_cm > > ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad mlx4_ib ib_mad ib_core > > microcode cpufreq_conservative cpufreq_userspace cpufreq_powersave > > acpi_cpufreq mperf dm_mod snd_hda_codec_hdmi snd_hda_codec_analog > > snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd intel_agp > > mlx4_core sr_mod sg intel_gtt cdrom soundcore i2c_i801 agpgart > > snd_page_alloc i2c_core pcspkr button uhci_hcd sd_mod crc_t10dif > > ehci_hcd usbcore edd ext3 mbcache jbd fan ata_generic ata_piix > > pata_marvell ahci libahci libata scsi_mod thermal processor > > thermal_sys hwmon [last unloaded: scst] > > > > Pid: 3562, comm: fcoethread/0 Not tainted 3.1.5-debug+ #1 System > > manufacturer P5Q DELUXE/P5Q DELUXE > > RIP: 0010:[<ffffffffa04b282b>] [<ffffffffa04b282b>] ft_prli+0x4b/0x350 > > [fcst] > > RSP: 0018:ffff8801a6273b70 EFLAGS: 00010282 > > RAX: ffff8801a6273fd8 RBX: 0000000000000000 RCX: 0000000000000006 > > RDX: 0000000000000001 RSI: 2222222222222222 RDI: 2222222222222222 > > RBP: ffff8801a6273be0 R08: 2222222222222222 R09: 2222222222222222 > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004 > > R13: ffff8801a6291c7c R14: ffff8801a6290800 R15: ffff8801a6290848 > > FS: 0000000000000000(0000) GS:ffff8801bfc00000(0000) > > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: 0000000000000002 CR3: 00000001a7ba1000 CR4: 00000000000406f0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Process fcoethread/0 (pid: 3562, threadinfo ffff8801a6272000, task > > ffff8801b13ecce0)/ > > Stack: > > ffff8801a6273bc0 ffffffff812f7593 00000000000000c0 ffff8801b9002a00 > > 0000000000000100 000000000000002c 0000000000000000 ffff8801ae1a4a18 > > ffff8801b1a1d600 ffff8801a6290800 ffff8801b1a1ce00 ffff8801ae1a4a18 > > Call Trace: > > [<ffffffff812f7593>] ? __alloc_skb+0x83/0x170 > > [<ffffffffa03b95ec>] fc_rport_enter_prli+0xec/0x220 [libfc] > > [<ffffffffa03ba531>] fc_rport_recv_req+0x541/0x1280 [libfc] > > [<ffffffff81082e8d>] ? trace_hardirqs_on_caller+0x11d/0x1b0 > > [<ffffffff813d44ad>] ? mutex_lock_nested+0x26d/0x330 > > [<ffffffffa03b6c00>] ? fc_lport_recv_els_req+0x30/0x140 [libfc] > > [<ffffffffa03b6c1f>] fc_lport_recv_els_req+0x4f/0x140 [libfc] > > [<ffffffffa03b5f14>] fc_lport_recv_req+0x174/0x230 [libfc] > > [<ffffffffa03b5dd1>] ? fc_lport_recv_req+0x31/0x230 [libfc] > > [<ffffffff81082f2d>] ? trace_hardirqs_on+0xd/0x10 > > [<ffffffffa03b2f6c>] fc_exch_recv+0x63c/0xe50 [libfc] > > [<ffffffffa03ce3b8>] fcoe_recv_frame+0x1d8/0x410 [fcoe] > > [<ffffffff81082e8d>] ? trace_hardirqs_on_caller+0x11d/0x1b0 > > [<ffffffffa03ceaf8>] ? fcoe_percpu_receive_thread+0x68/0xf0 [fcoe] > > [<ffffffff8104caf7>] ? local_bh_enable_ip+0x87/0xf0 > > [<ffffffffa03ceb00>] fcoe_percpu_receive_thread+0x70/0xf0 [fcoe] > > [<ffffffffa03cea90>] ? fcoe_rcv+0x450/0x450 [fcoe] > > [<ffffffff81069656>] kthread+0x96/0xa0 > > [<ffffffff813e0b74>] kernel_thread_helper+0x4/0x10 > > [<ffffffff813d6d9d>] ? retint_restore_args+0xe/0xe > > [<ffffffff810695c0>] ? __init_kthread_worker+0x70/0x70 > > [<ffffffff813e0b70>] ? gs_change+0xb/0xb > > Code: 90 f6 05 e8 2c 00 00 02 49 89 fe 48 89 d3 49 89 cd 0f 85 8e 02 > > 00 00 31 f6 48 c7 c7 00 50 4b a0 41 bc 04 00 00 00 e8 15 1a f2 e0 <0f> > > b6 43 02 a8 c0 75 65 8b 4b 0c 41 b4 08 0f c9 f6 c1 30 74 58 > > RIP [<ffffffffa04b282b>] ft_prli+0x4b/0x350 [fcst] > > RSP <ffff8801a6273b70> > > CR2: 0000000000000002 > > ---[ end trace c06e7c64e9c18831 ]--- > > > > The above call stack was obtained after having applied the following > > patch on initiator and target (on top of Linux kernel v3.1.5): > > > > NOTE: THIS PATCH INTRODUCES A RACE CONDITION ! > > > > --- > > drivers/scsi/fcoe/fcoe.c | 12 ++++++++---- > > drivers/scsi/fcoe/fcoe_ctlr.c | 2 ++ > > 2 files changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c > > index 5d0e9a2..2648454 100644 > > --- a/drivers/scsi/fcoe/fcoe.c > > +++ b/drivers/scsi/fcoe/fcoe.c > > @@ -1986,7 +1986,7 @@ static bool fcoe_match(struct net_device *netdev) > > */ > > static int fcoe_create(struct net_device *netdev, enum fip_state > fip_mode) > > { > > - int rc = 0; > > + int rc = 0, link_status; > > struct fcoe_interface *fcoe; > > struct fc_lport *lport; > > > > @@ -2024,14 +2024,18 @@ static int fcoe_create(struct net_device > > *netdev, enum fip_state fip_mode) > > /* start FIP Discovery and FLOGI */ > > lport->boot_time = jiffies; > > fc_fabric_login(lport); > > - if (!fcoe_link_ok(lport)) > > + link_status = fcoe_link_ok(lport); > > + rtnl_unlock(); > > + if (link_status == 0) > > fcoe_ctlr_link_up(&fcoe->ctlr); > > > > -out_nodev: > > - rtnl_unlock(); > > out_nortnl: > > mutex_unlock(&fcoe_config_mutex); > > return rc; > > + > > +out_nodev: > > + rtnl_unlock(); > > + goto out_nortnl; > > } > > > > /** > > diff --git a/drivers/scsi/fcoe/fcoe_ctlr.c > > b/drivers/scsi/fcoe/fcoe_ctlr.c > > index c74c4b8..e6301af 100644 > > --- a/drivers/scsi/fcoe/fcoe_ctlr.c > > +++ b/drivers/scsi/fcoe/fcoe_ctlr.c > > @@ -2642,7 +2642,9 @@ static void fcoe_ctlr_vn_timeout(struct fcoe_ctlr > > *fip) > > hton24(mac, FIP_VN_FC_MAP); > > hton24(mac + 3, new_port_id); > > fcoe_ctlr_map_dest(fip); > > + mutex_unlock(&fip->ctlr_mutex); > > fip->update_mac(fip->lp, mac); > > + mutex_lock(&fip->ctlr_mutex); > > fcoe_ctlr_vn_send_claim(fip); > > next_time = jiffies + msecs_to_jiffies(FIP_VN_ANN_WAIT); > > break; > > -- > > 1.7.3.4 > _______________________________________________ devel mailing list devel@open-fcoe.org https://lists.open-fcoe.org/mailman/listinfo/devel