Michael S. Tsirkin wrote:

I didn't follow this. Is this just an out of band keep alive message?

Yes. Exactly.

You may know that for each neighbour, the Linux network stack sends every m jiffies a --unicast-- ARP probe, where after n jiffies there is no ARP reply, it sends a broadcast ARP.

How does this solve the problem?
If the remote side has lost the connection, unicast ARPs will get dropped
but broadcast ARPs will get answered to. We'd need to re-create the connection
if this happens - but is there a way to detect this?

Yes, I know that there is a way to register for kernel level neighbour update events, so on each neighbour update, ipoib cm reconnects, plus you can remove the fast path memcmp we do today on the remote GUID, and we done :)

This is b/c it covers both the case that the unicast arp probe was not replied either since the --GID-- we have is not the correct one (eg under HA scheme) or that the remote --QP-- is not what we think.

Or.


_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to