Sean Hefty wrote:
IB_CM_REJ_STALE_CONN is sent in the following situations:
* Remote ID in REQ matches a connection that is in timewait.  This is treated as
a duplicate REQ that was processed after the connection had been terminated.
* Remote QPN in REQ or REP matches an existing connection, and REQ/REP was not
detected as a duplicate.

OK, thanks for the clarification.

On the other side, when the CM receives a reject message with that reason, the
local handle (id) is moved to the timewait state, where my understanding is that
it will sit there for a while and then a reject/stale-connection callback will be delivered to the user, the id will be removed.

correct

I don't see what the user can do for the case of the CM detecting a remote qpn match, if they will continue to use the same qpn this will happen in an endless loop, correct?

This is missing.  But neither the DREQ or DREP that are generated in this case
drive the state machines.  Both messages are simply generated and then consumed
by the CM.  (I don't even think it's clear if the local and remote IDs in the
DREQ/DREP are relative to the stale connection, or the new connection
request/reply.)

I agree that its quite unclear from the spec if the IDs to be used in the DREQ are those of the new connection or the stale one. Specifically, those of the stale connection might not exist anymore in the CM that gets the dreq and it would be just dropped, so there's no real gain in implementing this.

Correct - keep-alive messages are still needed by apps to know if their
connections are still valid.  IMO, stale connection detection becomes less
useful as the number of systems being connected to increase.

Is there anything the IB stack can do here to make apps coding simpler? In the past I was suggesting to use inform info "GID out" registration by the IB CM to catch remote ports going down, but thinking on it again, when a port goes down an RC QP pair doesn't, unless there was inflight data, so if the CM will deliver disconnect event it might be false alarm... and this registration would cause load on the SA so it does not scale well, unless we make it a feature of the CM which users would enable on target nodes and not initiators...

Or.

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to