Linux has a quite sophisticated mechanism to maintain / cache / probe / invalidate / update the network stack L2 neighbour info.

Path records are not just L2 info. They contain L4, L3, and L2 info together.

For example, in the Voltaire gen1 stack we had an ib arp module which was used by both IPoIB and native IB ULPs (SDP, iSER, Lustre, etc). This module managed some sort of path cache, were IPoIB was always asking for non-cached path and other ULPs were willing to get cached path.

IMO, using a cached AH is no different than using a cached path. You're simply mapping the PR data into another structure.

We're ignoring the problem here, and that is that a centralized SA doesn't scale. MPI stacks have largely ignored this problem by simply not doing path record queries. Path information is often hard-coded, with QPN data exchanged out of band over sockets (often over Ethernet).

We've seen problems running large MPI jobs without PR caching. I know that Silverstorm/QLogic did as well. And apparently Voltaire hit the same type of problem, since you added a caching module. (Did Mellanox and Topspin/Cisco create PR caches as well?) At least three companies working on IB came up with the same solution. What is the objection to the current patch set?

- Sean
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to