On Thu, Jul 26, 2007 at 10:21:16AM -0700, Sean Hefty wrote: > >My suggestion is that if you put the PR caching code within the ib_sa > >module, add a parameter for the ib_sa_path_rec_get() where the caller > >specifies if it is willing to get cached PR or not. Also I suggest that > > rdma_resolve_route() should be also enhanced to have a similar param > >such that even native IB based ULPs can ask for not cached info if they > >want to. > > I still believe that these should be separate policies. Consider that > the cache could have updated immediately before a PR lookup from IPoIB - > perhaps in response to an SA event.
FWIW, I agree with Sean. The kernel cache must be authoritative and must not be overriden by ULP. View this as the first step to creating a distributed SA, not as the first step to generalized PR caching. Linking things like ARP failures and QP failures to cache 'invalidates' is, IMHO, ultimately pointless. My view is that the SA will have to grow a means to refresh data in the distributed SA when it reconfigures the network. We have parts of this today via the various SA traps, but no per-GID invalidation. A client is probably going to detect a problem in the network before the SM can fix it, so doing a PR will just get the same old bad data. Further in many cases the SM can likely re-route the broken path so that the old PR is still valid. The number of times you actually need to change a PR once issued should be very small. If your network cares about fast-failover then it should have a high LMC and rely on IB's explicit multipath feature, and the kernel cache design should support this. This same argument is why IPoIB ARP decisions really have no bearing on IB PRs. IPoIB ARP logic and refreshes is designed to support the distributed ND lookup model - IB PR's have completely different lifetime rules that are totally unrelated to ARP's liftime rules. The existing trap monitoring in Sean's module covers about 90% of the cases in IB when you need to invalidate a PR, the last 10% will need something new :( Sean, it seems to me that alot of what is being talked about here really boils down to policy decisions about the caching. Maybe you'd see less resistance if the kernel module didn't have any policy and that was left to userspace. Even your choice today of putting the big GetTable query in the kernel strikes me as something I'd prefer to see in userspace. Jason _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
