On Fri, 2009-08-14 at 12:26 -0700, Joe Eykholt wrote: > Robert Love wrote: > > On Mon, 2009-08-10 at 12:07 -0700, Joe Eykholt wrote: > >> libfc receives PLOGIs from switches which are trying to discover what > >> kind of devices are present, and from other initiators to find out > >> if we're a target. > >> > >> As an initiator, some argue we don't need to handle incoming PLOGI > >> requests, and we currently reject them from unknown remote ports, > >> but accept them is we're in the middle of a PLOGI to the remote port. > >> > >> For eventual target implementations, we want to handle them always. > >> > >> For incoming PLOGI, don't fail if the rport_priv doesn't exist. > >> Just create it and go become READY without going through PRLI. If > >> PRLI occurs, then our roles will be set and we'll become READY again. > >> > >> Also, allow incoming PRLI in RTV state. > >> > >> Signed-off-by: Joe Eykholt <[email protected]> > >> --- > > Hi Joe, > > > > I'm having problems with this patch. I don't have many details at this > > point. What I see is that after a 'create' the debug_logging shows the > > stack doing this- > > > > 14 09:33:59 localhost kernel: [82642.117998] host212: rport ce0300: > > Received a PRLI accept > > Aug 14 09:33:59 localhost kernel: [82642.118208] host212: rport ce0300: > > Port entered RTV state from PRLI state > > Aug 14 09:33:59 localhost kernel: [82642.118438] host212: xid 35: > > Exchange timer armed > > Aug 14 09:33:59 localhost kernel: [82642.118678] host212: xid 35: > > f_ctl 90000 seq 1 > > Aug 14 09:33:59 localhost kernel: [82642.118917] host212: rport ce0300: > > Received a RTV reject > > Aug 14 09:33:59 localhost kernel: [82642.119127] host212: rport ce0300: > > Port is Ready > > Aug 14 09:33:59 localhost kernel: [82642.119349] host212: rport ce0300: > > work event 1 > > > > rport ce0300 is a target. At this point the initiator just sits there. I > > do not see any LUNs with 'fdisk -l'. I see this message a few seconds > > later, but at this point I think it's unrelated- > > I can't reproduce the problem. > > Assuming that's the target with the problem, it should be ready and > the 'work event 1' indicates it would create the transport rport. > > Since the patch is PLOGI-request related, and targets don't send PLOGIs, > it should not be an issue with this patch. Is it reproducible for you? > Yes, it's reproducible, not 100% of the time, but fairly regularly. I can't say with certainty that it's this patch, but I haven't reproduced the problem with earlier patches.
I ran fcoedump.sh when it was in the stalled state. You can see- /sys/class/fc_remote_ports/rport-61:0-1/uevent /sys/class/fc_remote_ports/rport-61:0-1/power/wakeup /sys/class/fc_remote_ports/rport-61:0-1/maxframe_size 2112 bytes /sys/class/fc_remote_ports/rport-61:0-1/supported_classes unspecified /sys/class/fc_remote_ports/rport-61:0-1/dev_loss_tmo 60 /sys/class/fc_remote_ports/rport-61:0-1/node_name 0x2001000deca31e81 /sys/class/fc_remote_ports/rport-61:0-1/port_name 0x2511000deca31e80 /sys/class/fc_remote_ports/rport-61:0-1/port_id 0xfffccf /sys/class/fc_remote_ports/rport-61:0-1/roles FCP Target, FCP Initiator /sys/class/fc_remote_ports/rport-61:0-1/port_state Blocked /sys/class/fc_remote_ports/rport-61:0-1/scsi_target_id 0 /sys/class/fc_remote_ports/rport-61:0-1/fast_io_fail_tmo off 0xfffcc is the switch's FC management port. Previously I hadn't seen this FCID, so I assume that the switch PLOGIs into the initiator (since your patches add that support). I need to confirm this with a wireshark trace. What I don't understand is why this influences ce0300, which is a clariion target. I especially don't understand why ce0300 becoming blocked wakes things up. > Could you run the fcc script during that interval where you don't > see the LUNs? A wire trace might also help. I'll send you the > latest fcc script separately. > > > Aug 14 09:34:17 localhost kernel: [82660.023560] host212: xid 4: > > Exchange timed out > > > > The system will stay without LUNs until I see this message- > > > > Aug 14 09:34:59 localhost kernel: [82701.952295] rport-212:0-1: blocked > > FC remote port time out: removing target and saving binding > > > > which is somehow triggering SCSI to start sending SCSI commands. > > > > Aug 14 09:34:59 localhost kernel: [82701.952892] host212: xid 3d: > > f_ctl 90000 seq 1 > > Aug 14 09:34:59 localhost kernel: [82701.953110] host212: xid 3d: > > f_ctl 90000 seq 2 > > Aug 14 09:34:59 localhost kernel: [82701.953393] host212: xid 45: > > f_ctl 90000 seq 1 > > Aug 14 09:34:59 localhost kernel: [82701.953614] host212: xid 45: > > f_ctl 90000 seq 2 > > Aug 14 09:34:59 localhost kernel: [82701.953832] scsi 212:0:1:0: > > Direct-Access DGC RAID 5 0326 PQ: 0 ANSI: 4 > > Aug 14 09:34:59 localhost kernel: [82701.954351] sd 212:0:1:0: Attached > > scsi generic sg1 type 0 > > Aug 14 09:34:59 localhost kernel: [82701.954448] host212: xid 2: > > f_ctl 90000 seq 1 > > Aug 14 09:34:59 localhost kernel: [82701.954523] host212: xid a: > > f_ctl 90000 seq 1 > > Aug 14 09:34:59 localhost kernel: [82701.954593] host212: xid 12: > > f_ctl 90000 seq 1 > > Aug 14 09:34:59 localhost kernel: [82701.954596] host212: xid 12: > > f_ctl 90000 seq 2 > > Aug 14 09:34:59 localhost kernel: [82701.954606] sd 212:0:1:0: [sdb] > > 8388608 512-byte logical blocks: (4.29 GB/4.00 GiB) > > > > I'll poke around in your patch and try to figure out what's going on. I > > would appreciate it if you could take a look too. > > > > Thanks, //Rob > > > > _______________________________________________ > devel mailing list > [email protected] > http://www.open-fcoe.org/mailman/listinfo/devel _______________________________________________ devel mailing list [email protected] http://www.open-fcoe.org/mailman/listinfo/devel
