On Tue, May 13, 2008 at 7:13 AM, Or Gerlitz <[EMAIL PROTECTED]> wrote: > RDMA_ALIGN_WITH_NETDEVICE high availability (ha) mode means that the consumer > of the rdma-cm wants that RDMA sessions would always use the same links (eg > <hca/port>) > as the IP stack does. In the current code, this does not happen when bonding > did > fail-over but the IB link used by an already existing session is operating > fine. > For now this mode is supported only for the connected services of the > rdma-cm. >
I'm not sure I've even seen an "RDMA Session". There are lots of RDMA *connections*, and there are RDMA applications that have an application-layer session that use several RDMA connections. But I'm fairly certain that there is no such thing as an "RDMA Session". Which raises some serious doubts about an automatic connection tear down based upon decisions at the RDMA layer. This will also create problems with iWARP/IB compatability. The iWARP standards (IETF and RDMAC) both solve the problem of RDMA endpoint / IP Address affinity by simply mandating it. While no real solution is given in the standards, it has generally been interpreted to mean: - You cannot create an RDMA connection on a device (or assign an existing TCP connection to an RDMA endpoint) if the device is not a valid route given the source/destination IP Addresses. - You can determine the set of possible RDMA devices by first consulting the local routing tables using the desired source and destination IP addresses. - If an RDMA device is no longer a valid route for a connection then the underlying TCP connection will fail (and it would be real nice if this happened promptly if the reason if a network reconfiguration rather than just waiting for things to fail). An important corner case here is that there may not be a need to migrate an existing RDMA Connection to a new device just because the *preferred* route has changed. The non-preferred route may still be fully operable and it may be preferable to continue using it for *this* connection given the cost of tear down and start up. Keep in mind that if the old route does not work then it will fail fairly quickly. If doing it quickly is important then the device should have mechanisms to ensure that it does not keep stale ARP or Neighbor Discovery lingering around. If the ARP/ND information is erased the connection will be torn down very quickly (destination unreachable). Now for both IB and iWARP there is a substantial possibility that a connection can be migrated to a different port within the same or co-operating devices. In that case the High Availability is achieved without the application having to be involved at all. If the connection is going to have to re-established on a *different* device there is a substantial risk that this will involve re-registering memory, re-connecting, and re-advertising buffers. I don't see how you can wisely decide that the benefits of a preferred route outweigh these costs on an application-independent basis. What if the application was nearly done with the connection? Or knew that it would be ending a current burst of activity in a few seconds and could pay for the connection shift-back then? And if the application is going to make the decision, then can't it just subscribe to the local routing tables on its own without any help from OFA? Even if it is response to a failure on the old connection, any application that has a "session'" concept will have procedures for re-establishing the session on a new connection. Where is the need for a one-size-fits-none standardized solution? _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
