On Thu, 10 Jun 2010 17:59:28 +0300 Alex Vainman <[email protected]> wrote:
> Wrote Roland Dreier: > > Thanks, nice work. I like this approach. Alex (Vainman) any comments > > on this? > > > > - R. > > The solution looks great. Hi all, in our further testing, we noticed that there is a substantial problem with the current solution. Depending on the order of memory registrations, we might end up with a corrupted node tree which blocks regions from being registered. When registering two memory regions A and B from within the same huge page, we will end up with one node in the tree which covers the whole huge page after registering A. When the second MR is registered, a node is created with the MR size rounded to the system page size (as there is no need to call madvise(), it is not noticed that MR B is part of a huge page). Now if MR A is deregistered before MR B, I see that the tree containing mem_nodes is empty afterwards, which causes problems for the deregistration of MR B, leaving the tree in a corrupted state with negative refcounts. This also breaks later registrations of other memory regions within this huge page. At the moment I do not see an obvious solution for this, but it's clear that an overhaul of this code is needed. I'm writing this to make sure that there won't be a release of libibverbs containing this incomplete code, but also to ask for comments from other people who might have an idea on how to fix this. Thanks for any comments! Alex -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
