On 10/02/14 12:34, Bart Van Assche wrote: > On 09/20/14 19:45, Or Gerlitz wrote: >> On Fri, Sep 19, 2014 at 3:58 PM, Bart Van Assche <[email protected]> >> wrote: >>> Attempting to connect three times may be insufficient after an >>> initiator system that was using multiple RDMA channels tries to >>> relogin. Additionally, this login retry mechanism is a workaround >>> for particular behavior of the IB/CM. >> >> Can you be more specific re the particular behavior of the IB CM? >> added Sean, the CM maintainer. > > Let's focus on the software behavior instead of the people who are > involved. What I have observed several times is that after a power cycle > of the initiator system the first few login attempts are rejected. I was > assuming that this was due to the IB/CM implementation but now that I > have had another look at the logs I see that there is not enough > information in the system logs to draw this conclusion. I will add > additional logging statements in the initiator and target kernel code > such that I can determine the root cause of this behavior.
(replying to my own e-mail / removed linux-scsi from CC-list) So far I have been able to reproduce this behavior once after pushing the reset button of the initiator system while it was in the connected state. After the initiator system had finished rebooting I started ibdump on both IB ports of the target system (attached to this e-mail). What surprised me is that I found all the messages I expected in the ibdump output (e.g. IB MAD device management query) but no CM messages. Both sides were running FW 2.32.5100. The following messages were logged at the initiator side while ibdump was running at the target side: Oct 02 17:43:42 msi kernel: scsi host14: ib_srp: REJ received Oct 02 17:43:42 msi kernel: scsi host14: REJ reason: stale connection Oct 02 17:43:42 msi kernel: scsi host14: ib_srp: giving up on stale connection Oct 02 17:43:42 msi kernel: scsi host14: ib_srp: Connection 0/12 failed Oct 02 17:43:42 msi kernel: scsi host15: ib_srp: REJ received Oct 02 17:43:42 msi kernel: scsi host15: REJ reason: stale connection Oct 02 17:43:42 msi kernel: scsi host15: ib_srp: giving up on stale connection Oct 02 17:43:42 msi kernel: scsi host15: ib_srp: Connection 0/12 failed After a few more login attempts SRP login succeeded. Bart.
p1.pcap
Description: application/vnd.tcpdump.pcap
p2.pcap
Description: application/vnd.tcpdump.pcap
