On Dec 10, 2010, at 4:56 PM, David Singleton wrote:

> Is there any plan to support NUMA memory binding for tasks?

Yes.

For some details on what we're planning for affinity, see the BOF slides that I 
presented at SC'10 on the OMPI web site (under "publications").

> Even with bind-to-core and memory affinity in 1.4.3 we were seeing 15-20%
> variation in run times on a Nehalem cluster.  This turned out to be mostly due
> to bad page placement.  Residual pagecache pages from the last job on a node 
> (or
> the memory of a suspended job in the case of preemption) could occasionally 
> cause
> a lot of non-local page placement.  We hacked the libnuma module to MPOL_BIND
> tasks to their local memory and eliminated the majority of this variability.
> We are currently running with this as default behaviour since its "the right
> thing" for 99% of jobs (we have an environment variable to back off to 
> affinity
> for the rest).

What OS and libnuma version are you running?  It has been my experience that 
libnuma can lie on RHEL 5 and earlier.  My (possibly flawed) understanding is 
that this is because of lack of proper kernel support; such "proper" kernel 
support was only added fairly recently (2.6.30something).

That aside, it's somewhat disappointing that MPOL_PREFERRED is not working well 
and that you had to switch to MPOL_BIND.  :-(

Should we add an MCA parameter to switch between BIND and PREFERRED, and 
perhaps default to BIND?

> I'm guessing/hoping doing the above based on hwloc will be easier/more
> maintainable. As a first pass, when is that likely to be an option?

The first pass of hwloc support will *only* be replacing the paffinity modules. 
 Memory support using hwloc is definitely planned, but if there are kernel 
issues, hwloc won't be any better than libnuma.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to