To test get_huge_pages(), I converted STREAM to use get_huge_pages() and
made two observations.

1. Requiring the caller to align to a hugepage boundary is stupid
2. The performance blew chunks in comparison to using malloc+morecore

Point 1 is straight-forward and dealt with in patch 1. Point 2 is more
subtle. Initially, I called get_huge_pages() three times for the three arrays.
Due to aligning to the hugepage-boundary, the buffers all used the same
cache-lines and the performance sucked badly but in a very non-obvious
fashion. It can be fixed in two ways

1. Call get_huge_pages() once with a length large enough to accommodate all
   three arrays.
2. Use wasted bytes to select a random cache line within the allocated buffer

For an application, 1 is probably preferred but it's unreasonable to expect
the average programmer to know they'll get poor cache behaviour depending
on how they call get_huge_pages(). Instead, patch 2 in this series uses
the wasted bytes to select a random cache line at the start of the buffer
and return it to the caller. A knowledgeable caller can disable this by
specifying GHP_ALIGN if they have a good reason.

 alloc.c                |   84 +++++++++++++++++++++++++++++++++++++++--------
 hugetlbfs.h            |    2 +
 man/get_huge_pages.3   |    7 ++++
 tests/get_huge_pages.c |    8 ++++
 4 files changed, 86 insertions(+), 15 deletions(-)


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Libhugetlbfs-devel mailing list
Libhugetlbfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Reply via email to