On 10/02/14 12:34, Bart Van Assche wrote:
> On 09/20/14 19:45, Or Gerlitz wrote:
>> On Fri, Sep 19, 2014 at 3:58 PM, Bart Van Assche <[email protected]> 
>> wrote:
>>> Attempting to connect three times may be insufficient after an
>>> initiator system that was using multiple RDMA channels tries to
>>> relogin. Additionally, this login retry mechanism is a workaround
>>> for particular behavior of the IB/CM.
>>
>> Can you be more specific re the particular behavior of the IB CM?
>> added Sean, the CM maintainer.
> 
> Let's focus on the software behavior instead of the people who are 
> involved. What I have observed several times is that after a power cycle 
> of the initiator system the first few login attempts are rejected. I was 
> assuming that this was due to the IB/CM implementation but now that I 
> have had another look at the logs I see that there is not enough 
> information in the system logs to draw this conclusion. I will add 
> additional logging statements in the initiator and target kernel code 
> such that I can determine the root cause of this behavior.

(replying to my own e-mail / removed linux-scsi from CC-list)

So far I have been able to reproduce this behavior once after pushing 
the reset button of the initiator system while it was in the connected 
state. After the initiator system had finished rebooting I started 
ibdump on both IB ports of the target system (attached to this e-mail). 
What surprised me is that I found all the messages I expected in the 
ibdump output (e.g. IB MAD device management query) but no CM messages. Both 
sides were running FW 2.32.5100. The following messages were logged at 
the initiator side while ibdump was running at the target side:

Oct 02 17:43:42 msi kernel: scsi host14: ib_srp: REJ received
Oct 02 17:43:42 msi kernel: scsi host14:   REJ reason: stale connection
Oct 02 17:43:42 msi kernel: scsi host14: ib_srp: giving up on stale connection
Oct 02 17:43:42 msi kernel: scsi host14: ib_srp: Connection 0/12 failed
Oct 02 17:43:42 msi kernel: scsi host15: ib_srp: REJ received
Oct 02 17:43:42 msi kernel: scsi host15:   REJ reason: stale connection
Oct 02 17:43:42 msi kernel: scsi host15: ib_srp: giving up on stale connection
Oct 02 17:43:42 msi kernel: scsi host15: ib_srp: Connection 0/12 failed

After a few more login attempts SRP login succeeded.

Bart.

Attachment: p1.pcap
Description: application/vnd.tcpdump.pcap

Attachment: p2.pcap
Description: application/vnd.tcpdump.pcap

Reply via email to