HP-MPI is pretty much doing the similar thing. --CQ > -----Original Message----- > From: general-boun...@lists.openfabrics.org > [mailto:general-boun...@lists.openfabrics.org] On Behalf Of > Jeff Squyres > Sent: Thursday, May 07, 2009 8:54 AM > To: Roland Dreier > Cc: Pavel Shamis; Hans Westgaard Ry; Terry Dontje; Lenny > Verkhovsky; HÃ¥kon Bugge; Donald Kerr; OpenFabrics General; > Alexander Supalov > Subject: Re: [ofa-general] Memory registration redux > > On May 6, 2009, at 4:10 PM, Roland Dreier (rdreier) wrote: > > > By the way, what's the desired behavior of the cache if a process > > registers, say, address range 0x1000 ... 0x3fff, and then the same > > process registers address range 0x2000 ... 0x2fff (with all > the same > > permissions, etc)? > > > > The initial registration creates an MR that is still valid for the > > smaller virtual address range, so the second registration is much > > cheaper if we used the cached registration; but if we use the cache > > for the second registration, and then deregister the first > one, we're > > stuck with a too-big range pinned in the cache because of > the second > > registration. > > > > > I don't know what the other MPI's do in this scenario, but > here's what OMPI will do: > > 1. lookup 0x1000-0x3fff in the cache; not find any of it it, > and therefore register > - add each page to our cache with a refcount of 1 2. > lookup 0x2000-0x2fff in the cache, find that all the pages > are already registered > - refcount++ on each page in the cache 3. when we go to > dereg 0x1000-0x3fff > - refcount-- on each page in the cache > - since some pages in the range still have refcount>0, > don't do anything further > > Specifically: the actual dereg of 0x1000-0x3fff is blocked on > also releasing 0x2000-0x2fff. > > Note that OMPI will only register a max of X bytes at a time > (where X defaults to 2MB). So even if a user calls > MPI_SEND(...) with an enormous buffer, we'll register it > X/page_size pages at a time, not the entire buffer at once. > Hence, the "buffer A is blocked from dereg'ing by buffer B" > scenario is *somewhat* mitigated -- it's less wasteful than > if we can registered/cached the entire huge buffer at once. > > Finally, note that if 0x2000-0x2fff had not been registered, > the 0x1000-0x3fff pages are not actually deregistered when > all the pages' > refcounts go to 0 -- they are just moved to the "able to be > dereg'ed list". We don't actually dereg it until we later > try to reg new memory and fail due to lack of resources. > Then we take entries off the "able to be dereg'ed list" and > dereg them, then try reg'ing the new memory again. > > MVAPICH: do you guys do similar things? > > (I don't know if HP/Scali/Intel will comment on their > registration cache schemes) > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > general mailing list > general@lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general