Sean Hefty wrote:
I am willing to go with the local sa coming to serve large MPI jobs, so you load as a prerequisite to spawning large all-to-all job.

But, I think the default for IPoIB needs to be usage of non cached PR.

I think this ties together two things that aren't directly related. We have two network stacks running on top of each other here. Their policies should be separate.

The rational beyond my argument is that with IPoIB being an L2 packet services for the network stack, when the network stack decides to renew its L2 info for a neighbour (eg as it does not reply to direct probes) if IPoIB uses cached IB info its doing something against what it was asked to do.

As an example, let's reverse this. Imagine instead that you implement IB over IP. Should an IB path refresh policy dictate that IP update its ARP tables?

in this settings (IB above IP), yes.

Or, looking at it differently, do you prevent IP from updating the ARP table unless the IB stack asks for it?

no. If the lower stack wants to update its L2 info, its perfectly fine.

For example... the current IPoIB implementation flushes all its IB L2 info (address handles and PRs) when its gets IB event on the port (up/down/lid-change/sm-lid-change/client-re-register/etc), this is very much correct design.

The policy for local PR caching should be set by an administrator. Now, we could provide a policy setting that ties it to the ARP cache, which sounds like a good idea. This will be less efficient in some use models, more efficient in others. But not all PRs belong to IPoIB, so we need a way to handle this. However, I don't believe that we have to always enforce such a policy, especially since the current stack doesn't have this behavior today.

I thinking that we are making progress, starting to converge.

My suggestion is that if you put the PR caching code within the ib_sa module, add a parameter for the ib_sa_path_rec_get() where the caller specifies if it is willing to get cached PR or not. Also I suggest that rdma_resolve_route() should be also enhanced to have a similar param such that even native IB based ULPs can ask for not cached info if they want to.

For example, I think it would be correct for IB block and file I/O ULPs (iSER, SRP, Lustre, rNFS, etc) to request non cached PR, as their connecting model is not all-to-all but rather n-to-m (n clients to m servers with m << n), the connections are long-lived (hours, days, weeks, more) and a connection failure as of PR caching does not seem acceptable.

Or.



_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to