Is there any plan to support NUMA memory binding for tasks?

Even with bind-to-core and memory affinity in 1.4.3 we were seeing 15-20%
variation in run times on a Nehalem cluster.  This turned out to be mostly due
to bad page placement.  Residual pagecache pages from the last job on a node (or
the memory of a suspended job in the case of preemption) could occasionally 
cause
a lot of non-local page placement.  We hacked the libnuma module to MPOL_BIND
tasks to their local memory and eliminated the majority of this variability.
We are currently running with this as default behaviour since its "the right
thing" for 99% of jobs (we have an environment variable to back off to affinity
for the rest).

I'm guessing/hoping doing the above based on hwloc will be easier/more
maintainable. As a first pass, when is that likely to be an option?

David

Reply via email to