On Wed, Jun 05, 2013 at 04:53:48PM +0000, Jeff Squyres (jsquyres) wrote: > On Jun 5, 2013, at 6:39 AM, Haggai Eran <[email protected]> wrote: > > > Perhaps I'm missing something, but I believe ODP deals with the first > > two problems in the list (slide 8), even if it doesn't solve them > > completely. > > Unfortunately, it does not. If we could register(0 ... 2^64) and > never have to worry about registered memory, that might be cool > (depending on how that actually works) -- more below. > > See this blog post that describes the freed registered memory issue: > > > http://blogs.cisco.com/performance/registered-memory-rma-rdma-and-mpi-implementations/ > > and consider the following valid user code: > > a = malloc(x); // a gets (va=0x100, pa=0x12345) back from malloc > MPI_Send(a, ...); // MPI registers 0x100 for len=x, and saves (0x100,x) in > reg cache > free(a); > a = malloc(x); // a gets (va=0x100, pa=0x98765) back from malloc > MPI_Send(a, ...); // MPI sees a=0x100 and things that it is already registered > // ...kaboom > > In short, MPI has to intercept free/sbrk/whatever so that it can > update its registration cache.
ODP is supposed to completely solve this problem. The HCA's view and Kernels view of virtual to physical mapping becomes 100% synchronized, and there is no 'kaboom'. The kernel updates the HCA after the free, and after the 2nd malloc to 100% match the current virtual memory map in the process. MPI still has to register the memory in the first place.. .. and somehow stuff has to be managed to avoid HCA page faults in common cases .. and the feature must be discoverable .. and and and .. The biggest issue to me is going to be efficiently prefetching receive buffers so that RNR acks are avoided in all common cases... > solves the MPI-must-catch-free-sbrk-etc. issues...? And therefore, > having some kind of ummunotify-like functionality as a verb would be > a Very Good Thing. AFAIK the ummunotify user space API was nak'd by the core kernel guys. I got the impression people thought it would be acceptable as a rdma API, not a general API. So it is waiting on someone to recast the function within verbs to make progress... Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
