We are trying to improve the reliability of the libhugetlbfs malloc
function by prefaulting the huge pages at allocation time.  If we do
this, we can guarantee (for apps which do not fork) that all of the
pages are available and fall back to normal pages if enough huge aren't
available.

The problem with prefaulting (via mlock) is that huge pages are
preferentially taken from the numa node which called malloc, not the
nodes where the memory is first accessed.  This leads to serious
performance regressions which make it slower than using normal pages.

The solution, for these applications, is to use the NUMA api to set
bind/interleave policies that make sense for the application in
question.  My question for you is: What tuning scenarios would you
require for an environment variable that controls this behavior?

The two options we mulled over:
 - Do nothing -- rely on previously set numa policy (numactl)
(The above will generally exhaust all node-local huge pages first, then
move on to other nodes)
 - Interleave on all nodes

Would these be enough to cover the cases you see when running various
workloads on numa systems?  Which option should be the default?  My gut
feel is we should take option 1 above for the default.  This will handle
single-threaded apps that don't bounce around multiple numa nodes.

That still leaves one issue -- Applications that move among numa nodes
unpredictably after starting up will always perform worse under this new
algorithm than before, since demand faulting allows the memory to be
instantiated node-local wherever the process happens to be running.

Ok, I am beginning to ramble.  I'll leave it here for now.  Hope this
all makes sense --and if not, let me know.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Libhugetlbfs-devel mailing list
Libhugetlbfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Reply via email to