On Dec 10, 2010, at 4:56 PM, David Singleton wrote: > Is there any plan to support NUMA memory binding for tasks?
Yes. For some details on what we're planning for affinity, see the BOF slides that I presented at SC'10 on the OMPI web site (under "publications"). > Even with bind-to-core and memory affinity in 1.4.3 we were seeing 15-20% > variation in run times on a Nehalem cluster. This turned out to be mostly due > to bad page placement. Residual pagecache pages from the last job on a node > (or > the memory of a suspended job in the case of preemption) could occasionally > cause > a lot of non-local page placement. We hacked the libnuma module to MPOL_BIND > tasks to their local memory and eliminated the majority of this variability. > We are currently running with this as default behaviour since its "the right > thing" for 99% of jobs (we have an environment variable to back off to > affinity > for the rest). What OS and libnuma version are you running? It has been my experience that libnuma can lie on RHEL 5 and earlier. My (possibly flawed) understanding is that this is because of lack of proper kernel support; such "proper" kernel support was only added fairly recently (2.6.30something). That aside, it's somewhat disappointing that MPOL_PREFERRED is not working well and that you had to switch to MPOL_BIND. :-( Should we add an MCA parameter to switch between BIND and PREFERRED, and perhaps default to BIND? > I'm guessing/hoping doing the above based on hwloc will be easier/more > maintainable. As a first pass, when is that likely to be an option? The first pass of hwloc support will *only* be replacing the paffinity modules. Memory support using hwloc is definitely planned, but if there are kernel issues, hwloc won't be any better than libnuma. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/