Re: [Libhugetlbfs-devel] [PATCH 2/2] Use wasted bytes in get_huge_pages() to place buffers at different cache lines

Mel Gorman Thu, 06 Nov 2008 02:50:00 -0800

On Thu, Nov 06, 2008 at 12:06:23PM +1100, David Gibson wrote:
> On Thu, Nov 06, 2008 at 12:51:54AM +0000, Mel Gorman wrote:
> > On Thu, Nov 06, 2008 at 10:15:54AM +1100, David Gibson wrote:
> > > On Wed, Nov 05, 2008 at 09:43:40AM -0600, Adam Litke wrote:
> > > > On Wed, 2008-11-05 at 14:02 +0000, Mel Gorman wrote:
> > > > > @@ -127,6 +140,34 @@ void *get_huge_pages(size_t len, ghp_t flags)
> > > > >               return NULL;
> > > > >       }
> > > > > 
> > > > > +offset:
> > > > 
> > > > This cache-coloring code (below) is nice.  Could we move it into it's
> > > > own function though?  I like the idea of caching the sysconf value, but
> > > > perhaps you could do that as a static variable within the new
> > > > function.
> > > 
> > > Yeah, I don't think this belongs in get_huge_pages().
> > > get_huge_pages() is supposed to be the "raw" low-level interface to
> > > just grab some hugepages.  Offsetting within the page should be up to
> > > the caller, if it cares.
> > > 
> > 
> > Lets take an average programmer who is tasked with converting the
> > customer allocator used in OpenMPI[1] to use large pages.  He's very
> > unlikely to take the cache alignment into account and instead be
> > faced by weird slowdowns after porting the code. I don't think
> > they'll dig through documentation, oprofile or anything else looking
> > for the answer. They'll just assume hugepages are stupid and move
> > on. Even if they know about cache line coloring, they'll assume the
> > allocator is already doing it.
> > 
> > A caller that is really looking for X number of hugepages is going to align
> > the length resulting in no wastage and no colouring. If for some reason
> > they are not aligning their length but need the start hugepage-aligned,
> > they would use GHP_ALIGN.
> 
> No, I still don't buy it - making assumptions about what the program's
> intending based on details of the parameters is just dangerous
> territory, even if you're right most of the time. i


Thing is, functions like malloc already do things already. In the kernel
it's vaguely similar. kmalloc() for example colours on behalf of the
caller because it's the right thing to do where possible.

> The basic problem
> here is that you're making get_huge_pages() do two different things:
> first act as a low-level "grab some huge pages" function, and second
> act as a general handy "grab some memory that's in hugepages".  This
> extension is appropriate for the second, but not the first.
> 

I view it more as the latter - i.e. it's a function for grabbing memory that
is in hugepages. Certainly when I was doing a straight-forward converting
of STREAM to get_huge_pages(), I got caught by this as the impact of not
colouring was so severe. Look at this

http://www.csn.ul.ie/~mel/comparison-align-Add.png

1-get_huge_pages() calls get_huge_pages() once and places the 3 arrays used
by STREAM in it. 3-get_huge_page-coloured calls get_huge_pages() once for
each array - which is what a programmer is most likely to do and what I
started with.  Note how they perform similarly because this has my patch
applied and uses cache coloring.

3-get_huge_page-aligned uses GHP_ALIGN and behaves as get_huge_pages()
in 2.1-pre4 currently does. It's performance *blows* and it took me a few
hours to pin the problem on lack of colouring because I'm used of allocators
handling that sort of issue on my behalf. Instead I chased down blind alleys.

> Having the second is a reasonable thing - although to a certain extent
> just using malloc() with HUGEPAGE_MORECORE enabled accomplishes this -

Ordinarily yes but there is at least one case where a custom allocator is
involved and using HUGEPAGE_MORECORE does nothing.  get_huge_pages() was
meant to help those guys port.

> but I think it should be separate from the first.  I just don't see
> any clear line between this sort of extension and a full blown
> hugepage-backed malloc()-style allocator.  Again, not a bad idea in
> principle, but not the purpose of get_huge_pages().
> 

I've no interest in writing a hugepage-backed malloc()-style allocator.
There are plenty of people who write allocators already. Cache colouring is
simple and if the caller is passing in stupid lengths, we might as well help
them out instead of throwing them curve balls with cache issues.

> > My guess is that many callers will be relatively naive in terms of hugepage,
> > will not align their length and will not care about the alignment of the
> > start of the buffer. In that case defaulting to colouring is the path of
> > least surprise for an allocator API.
> 
> Sure, so provide an API for this, but leave get_huge_pages() as is.

It feels like function bloat when the flags parameter is there to
get_huge_pages() for purposes just like this. I'm actually going to be
surprised if needs GHP_ALIGN. A caller that knows it needs hugepage-aligned
memory is also going to know they should pass in a hugepage-aligned length
to avoid wastage.

What would you suggest as an interface and why it is better than
get_huge_pages() doing something sensible by default and providing GHP_ALIGN
to avoid the ugliness that is malloc() vs posix_memalign()?

> As you say this is "least surprise" *for a function that looks like a
> more-or-less general allocator*.  It's emphatically *not* least
> surprise (nor "minimum magic") for something that looks like it's a
> "raw" interface to the kernel facility. 

I *wrote* the bloody thing and still got caught by the default aligning to
the same boundary. Cache colouring is easy to implement but also very easy
to forget why it is important. I'd prefer it do something that performs well
if it can instead of deliberatly sucking. If I forget about cache
colouring, it also means I didn't care what the alignment of the pointer
I got back was as long as it points to valid memory.

> That was the original intent
> of get_huge_pages(), and I think it's still what the name suggests it
> is.
> 

I regret that name now.

> Let's not ram these two rather different interface styles into a
> single function entry-point.  If we do start writing a general
> hugepage backed allocator, ideally it will use the same code paths
> that malloc() would go through when automatic
> malloc()-hugepage-backing is enabled.
> 

We just want to provide an interface suitable for people that writing
allocators or are porting parts of their application to use hugepages where
it is sensible. The former should be already coding to not waste any memory
and the latter should be helped as much as possible by defaulting to behaviour
whose performance does not suck.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Libhugetlbfs-devel mailing list
Libhugetlbfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Re: [Libhugetlbfs-devel] [PATCH 2/2] Use wasted bytes in get_huge_pages() to place buffers at different cache lines

Reply via email to