To test get_huge_pages(), I converted STREAM to use get_huge_pages() and made two observations.
1. Requiring the caller to align to a hugepage boundary is stupid 2. The performance blew chunks in comparison to using malloc+morecore Point 1 is straight-forward and dealt with in patch 1. Point 2 is more subtle. Initially, I called get_huge_pages() three times for the three arrays. Due to aligning to the hugepage-boundary, the buffers all used the same cache-lines and the performance sucked badly but in a very non-obvious fashion. It can be fixed in two ways 1. Call get_huge_pages() once with a length large enough to accommodate all three arrays. 2. Use wasted bytes to select a random cache line within the allocated buffer For an application, 1 is probably preferred but it's unreasonable to expect the average programmer to know they'll get poor cache behaviour depending on how they call get_huge_pages(). Instead, patch 2 in this series uses the wasted bytes to select a random cache line at the start of the buffer and return it to the caller. A knowledgeable caller can disable this by specifying GHP_ALIGN if they have a good reason. alloc.c | 84 +++++++++++++++++++++++++++++++++++++++-------- hugetlbfs.h | 2 + man/get_huge_pages.3 | 7 ++++ tests/get_huge_pages.c | 8 ++++ 4 files changed, 86 insertions(+), 15 deletions(-) ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel