On Thu, 10 Jun 2010 17:59:28 +0300
Alex Vainman <[email protected]> wrote:

> Wrote Roland Dreier:
> > Thanks, nice work.  I like this approach.  Alex (Vainman) any comments
> > on this?
> > 
> >  - R.
> 
> The solution looks great.

Hi all,

in our further testing, we noticed that there is a substantial problem with
the current solution. Depending on the order of memory registrations, we might
end up with a corrupted node tree which blocks regions from being registered.

 When registering two memory regions A and B from within
the same huge page, we will end up with one node in the tree which covers the
whole huge page after registering A. When the second MR is registered, a node
is created with the MR size rounded to the system page size (as there is no
need to call madvise(), it is not noticed that MR B is part of a huge page).

Now if MR A is deregistered before MR B, I see that the tree containing
mem_nodes is empty afterwards, which causes problems for the deregistration of
MR B, leaving the tree in a corrupted state with negative refcounts. This also
breaks later registrations of other memory regions within this huge page.

At the moment I do not see an obvious solution for this, but it's clear that
an overhaul of this code is needed. I'm writing this to make sure that there
won't be a release of libibverbs containing this incomplete code, but also
to ask for comments from other people who might have an idea on how to fix
this.

Thanks for any comments!

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to