On Fri, 2009-08-14 at 12:26 -0700, Joe Eykholt wrote:
> Robert Love wrote:
> > On Mon, 2009-08-10 at 12:07 -0700, Joe Eykholt wrote:
> >> libfc receives PLOGIs from switches which are trying to discover what
> >> kind of devices are present, and from other initiators to find out
> >> if we're a target.
> >>
> >> As an initiator, some argue we don't need to handle incoming PLOGI
> >> requests, and we currently reject them from unknown remote ports,
> >> but accept them is we're in the middle of a PLOGI to the remote port.
> >>
> >> For eventual target implementations, we want to handle them always.
> >>
> >> For incoming PLOGI, don't fail if the rport_priv doesn't exist.
> >> Just create it and go become READY without going through PRLI.  If
> >> PRLI occurs, then our roles will be set and we'll become READY again.
> >>
> >> Also, allow incoming PRLI in RTV state.
> >>
> >> Signed-off-by: Joe Eykholt <[email protected]>
> >> ---
> > Hi Joe,
> > 
> > I'm having problems with this patch. I don't have many details at this
> > point. What I see is that after a 'create' the debug_logging shows the
> > stack doing this-
> > 
> > 14 09:33:59 localhost kernel: [82642.117998] host212: rport ce0300:
> > Received a PRLI accept
> > Aug 14 09:33:59 localhost kernel: [82642.118208] host212: rport ce0300:
> > Port entered RTV state from PRLI state
> > Aug 14 09:33:59 localhost kernel: [82642.118438] host212: xid   35:
> > Exchange timer armed
> > Aug 14 09:33:59 localhost kernel: [82642.118678] host212: xid   35:
> > f_ctl  90000 seq  1
> > Aug 14 09:33:59 localhost kernel: [82642.118917] host212: rport ce0300:
> > Received a RTV reject
> > Aug 14 09:33:59 localhost kernel: [82642.119127] host212: rport ce0300:
> > Port is Ready
> > Aug 14 09:33:59 localhost kernel: [82642.119349] host212: rport ce0300:
> > work event 1
> > 
> > rport ce0300 is a target. At this point the initiator just sits there. I
> > do not see any LUNs with 'fdisk -l'. I see this message a few seconds
> > later, but at this point I think it's unrelated-
> 
> I can't reproduce the problem.
> 
> Assuming that's the target with the problem, it should be ready and
> the 'work event 1' indicates it would create the transport rport.
> 
> Since the patch is PLOGI-request related, and targets don't send PLOGIs,
> it should not be an issue with this patch.  Is it reproducible for you?
> 
Yes, it's reproducible, not 100% of the time, but fairly regularly. I
can't say with certainty that it's this patch, but I haven't reproduced
the problem with earlier patches.

I ran fcoedump.sh when it was in the stalled state. You can see-

/sys/class/fc_remote_ports/rport-61:0-1/uevent
/sys/class/fc_remote_ports/rport-61:0-1/power/wakeup

/sys/class/fc_remote_ports/rport-61:0-1/maxframe_size
2112 bytes
/sys/class/fc_remote_ports/rport-61:0-1/supported_classes
unspecified
/sys/class/fc_remote_ports/rport-61:0-1/dev_loss_tmo
60
/sys/class/fc_remote_ports/rport-61:0-1/node_name
0x2001000deca31e81
/sys/class/fc_remote_ports/rport-61:0-1/port_name
0x2511000deca31e80
/sys/class/fc_remote_ports/rport-61:0-1/port_id
0xfffccf
/sys/class/fc_remote_ports/rport-61:0-1/roles
FCP Target, FCP Initiator
/sys/class/fc_remote_ports/rport-61:0-1/port_state
Blocked
/sys/class/fc_remote_ports/rport-61:0-1/scsi_target_id
0
/sys/class/fc_remote_ports/rport-61:0-1/fast_io_fail_tmo
off

0xfffcc is the switch's FC management port. Previously I hadn't seen
this FCID, so I assume that the switch PLOGIs into the initiator (since
your patches add that support). I need to confirm this with a wireshark
trace.

What I don't understand is why this influences ce0300, which is a
clariion target. I especially don't understand why ce0300 becoming
blocked wakes things up.


> Could you run the fcc script during that interval where you don't
> see the LUNs?  A wire trace might also help.  I'll send you the
> latest fcc script separately.
> 
> > Aug 14 09:34:17 localhost kernel: [82660.023560] host212: xid    4:
> > Exchange timed out
> > 
> > The system will stay without LUNs until I see this message-
> > 
> > Aug 14 09:34:59 localhost kernel: [82701.952295]  rport-212:0-1: blocked
> > FC remote port time out: removing target and saving binding
> > 
> > which is somehow triggering SCSI to start sending SCSI commands.
> > 
> > Aug 14 09:34:59 localhost kernel: [82701.952892] host212: xid   3d:
> > f_ctl  90000 seq  1
> > Aug 14 09:34:59 localhost kernel: [82701.953110] host212: xid   3d:
> > f_ctl  90000 seq  2
> > Aug 14 09:34:59 localhost kernel: [82701.953393] host212: xid   45:
> > f_ctl  90000 seq  1
> > Aug 14 09:34:59 localhost kernel: [82701.953614] host212: xid   45:
> > f_ctl  90000 seq  2
> > Aug 14 09:34:59 localhost kernel: [82701.953832] scsi 212:0:1:0:
> > Direct-Access     DGC      RAID 5           0326 PQ: 0 ANSI: 4
> > Aug 14 09:34:59 localhost kernel: [82701.954351] sd 212:0:1:0: Attached
> > scsi generic sg1 type 0
> > Aug 14 09:34:59 localhost kernel: [82701.954448] host212: xid    2:
> > f_ctl  90000 seq  1
> > Aug 14 09:34:59 localhost kernel: [82701.954523] host212: xid    a:
> > f_ctl  90000 seq  1
> > Aug 14 09:34:59 localhost kernel: [82701.954593] host212: xid   12:
> > f_ctl  90000 seq  1
> > Aug 14 09:34:59 localhost kernel: [82701.954596] host212: xid   12:
> > f_ctl  90000 seq  2
> > Aug 14 09:34:59 localhost kernel: [82701.954606] sd 212:0:1:0: [sdb]
> > 8388608 512-byte logical blocks: (4.29 GB/4.00 GiB)
> > 
> > I'll poke around in your patch and try to figure out what's going on. I
> > would appreciate it if you could take a look too.
> > 
> > Thanks, //Rob
> > 
> 
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-fcoe.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to