On Jun 5, 2013, at 6:39 AM, Haggai Eran <[email protected]> wrote:
> Perhaps I'm missing something, but I believe ODP deals with the first
> two problems in the list (slide 8), even if it doesn't solve them
> completely.
Unfortunately, it does not. If we could register(0 ... 2^64) and never have to
worry about registered memory, that might be cool (depending on how that
actually works) -- more below.
See this blog post that describes the freed registered memory issue:
http://blogs.cisco.com/performance/registered-memory-rma-rdma-and-mpi-implementations/
and consider the following valid user code:
a = malloc(x); // a gets (va=0x100, pa=0x12345) back from malloc
MPI_Send(a, ...); // MPI registers 0x100 for len=x, and saves (0x100,x) in reg
cache
free(a);
a = malloc(x); // a gets (va=0x100, pa=0x98765) back from malloc
MPI_Send(a, ...); // MPI sees a=0x100 and things that it is already registered
// ...kaboom
In short, MPI has to intercept free/sbrk/whatever so that it can update its
registration cache.
> In the future we want to implement an implicit memory region covering
> the entire process address space, thus eliminating the need for memory
> registration almost completely (you might still want memory
> registration, or memory windows, in order to control permissions of
> remote operations).
This would be great, as long as it's fast, transparent, and has no subtle
implementation effects (like causing additional RNR NAKs for pages that are
still in memory, which, according to your descriptions, it sounds like it
won't).
> We can also allow fork to work with our implementation. Copy-on-write
> will work with ODP regions by invalidating the HCA's page tables before
> modifying the pages to be read-only. A page fault from the HCA can then
> refill the pages, or even break COW in case of a write.
That would be cool, too. fork() has been a continuing problem -- solving that
problem would be wonderful.
If this ODP stuff becomes a new verb, it would be good:
- if these fork-fixing / register-infinite capabilities can be queried at run
time (maybe on ibv_device_cap_flags?) so that ULPs can know to use this
functionality
- if driver owners can get a heads up so that they can know to implement it
>> Why don't we have something like ummunotify yet?
> I think that the problem we are trying to solve is better handled inside
> the kernel. If you are going to change the HCA's memory mappings, you'd
> have to go through the kernel anyway.
If/when you allow registering all memory, then I think you're right -- the
MPI-must-intercept-free/sbrk-whatever issue may go away (that's why I started
this thread asking about register(0 .. 2^64)). But without that, unless I'm
missing something, I don't think it solves the MPI-must-catch-free-sbrk-etc.
issues...? And therefore, having some kind of ummunotify-like functionality as
a verb would be a Very Good Thing.
--
Jeff Squyres
[email protected]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html