So, my main concern is with the role of kernel caching and especially with
how control is exported to user space.

The only control currently exported by the local SA is a module parameter that allows a user to force a refresh of the entire cache. I do not want to extend this until we can get at least some basic PR caching functionality merged.

I want something small that we can build on, and the local_sa patch is already 1300 lines of code, with another 1000 lines of code to support informinfo registration.

Clearly the kernel needs a fast lookup cache for things like ipoib and
others. I don't think a kernel module needs or wants a full on
distributed SA.

We talking about PR caching only at this point, with possible extensions to support QoS. Other SA information is not cached or needed.

For all to all connections, current code does something like the following:

1. Resolves IP addresses to DGIDs using ARP. This results in IPoIB querying the SA and caching 1 PR per DGID. 2. Apps query the SA for PRs, with 1 PR query per DGID. Eventually we'll get back the same set of PRs that IPoIB already had cached. 3. Establish the connections. The IB CM stores the PR information with each connection in order to set the QP attributes properly.

We end up with redundant queries and the PR being cached in multiple places. One optimization is to replace the N PR queries with a single, more efficient GetTable query. A second optimization is to centralize the PR caching. The local SA does the first, and starts us down the road of the second.

I personally think a simple in-kernel (small) fast lookup cache merged
with the ipoib cache that has a netlink interface to userspace to
add/delete/flush entries is a very good solution that will keep being
useful in future. netlink would also carry cache miss queries to
userspace. In absense of a daemon the kernel could query on its own
but cache very conservatively. A userspace version of the very
agressive cache you have now could also be created right away.

I believe that the PR caching should be done outside of IPoIB. Other paths may exist that IPoIB does not use.

This is because I firmly do not belive in caching as a solution to the
scalability problems. It must be solved with some level of replication
and distribution of the SA data and algorithms.

PR caching *is* replication of the SA data. The local SA works with all existing SAs. It is not tied to one vendor, nor does it require changes to the SAs. Sure, we can define vendor specific protocols to assist with/optimize synchronization, but I don't believe it is necessary in an initial submission. (In fact I think it's undesirable at this point, since it would require changes to the SA.)

Maybe you could summarise how the user/kernel interface works?  The
last I saw was something based on MADs that looked very inefficient
compared with netlink.

I suggested a MAD interface to the local SA as being the most extensible. It allows interacting with the cache from a local or remote node in a very IB fashion. The local SA is located over QP1, and any new protocols can re-use the existing SA MAD format.

For example, the cache could be loaded using a 'SetTable PR' MAD. It doesn't matter if the MAD is sent from a local user space daemon, some distributed SA agent, or the master SA. Paths can be invalidated by sending 'Delete PR' MADs.

It may also be possible to extend such an interface for QoS purposes.

- Sean
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to