On Thu, Nov 06, 2008 at 12:51:54AM +0000, Mel Gorman wrote:
> On Thu, Nov 06, 2008 at 10:15:54AM +1100, David Gibson wrote:
> > On Wed, Nov 05, 2008 at 09:43:40AM -0600, Adam Litke wrote:
> > > On Wed, 2008-11-05 at 14:02 +0000, Mel Gorman wrote:
> > > > @@ -127,6 +140,34 @@ void *get_huge_pages(size_t len, ghp_t flags)
> > > >                 return NULL;
> > > >         }
> > > > 
> > > > +offset:
> > > 
> > > This cache-coloring code (below) is nice.  Could we move it into it's
> > > own function though?  I like the idea of caching the sysconf value, but
> > > perhaps you could do that as a static variable within the new
> > > function.
> > 
> > Yeah, I don't think this belongs in get_huge_pages().
> > get_huge_pages() is supposed to be the "raw" low-level interface to
> > just grab some hugepages.  Offsetting within the page should be up to
> > the caller, if it cares.
> > 
> 
> Lets take an average programmer who is tasked with converting the
> customer allocator used in OpenMPI[1] to use large pages.  He's very
> unlikely to take the cache alignment into account and instead be
> faced by weird slowdowns after porting the code. I don't think
> they'll dig through documentation, oprofile or anything else looking
> for the answer. They'll just assume hugepages are stupid and move
> on. Even if they know about cache line coloring, they'll assume the
> allocator is already doing it.
> 
> A caller that is really looking for X number of hugepages is going to align
> the length resulting in no wastage and no colouring. If for some reason
> they are not aligning their length but need the start hugepage-aligned,
> they would use GHP_ALIGN.

No, I still don't buy it - making assumptions about what the program's
intending based on details of the parameters is just dangerous
territory, even if you're right most of the time.  The basic problem
here is that you're making get_huge_pages() do two different things:
first act as a low-level "grab some huge pages" function, and second
act as a general handy "grab some memory that's in hugepages".  This
extension is appropriate for the second, but not the first.

Having the second is a reasonable thing - although to a certain extent
just using malloc() with HUGEPAGE_MORECORE enabled accomplishes this -
but I think it should be separate from the first.  I just don't see
any clear line between this sort of extension and a full blown
hugepage-backed malloc()-style allocator.  Again, not a bad idea in
principle, but not the purpose of get_huge_pages().

> My guess is that many callers will be relatively naive in terms of hugepage,
> will not align their length and will not care about the alignment of the
> start of the buffer. In that case defaulting to colouring is the path of
> least surprise for an allocator API.

Sure, so provide an API for this, but leave get_huge_pages() as is.
As you say this is "least surprise" *for a function that looks like a
more-or-less general allocator*.  It's emphatically *not* least
surprise (nor "minimum magic") for something that looks like it's a
"raw" interface to the kernel facility.  That was the original intent
of get_huge_pages(), and I think it's still what the name suggests it
is.

Let's not ram these two rather different interface styles into a
single function entry-point.  If we do start writing a general
hugepage backed allocator, ideally it will use the same code paths
that malloc() would go through when automatic
malloc()-hugepage-backing is enabled.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Libhugetlbfs-devel mailing list
Libhugetlbfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Reply via email to