On Feb 14, 2011, at 9:35 AM, Siew Yin Chan wrote:

> 1. I tried Open MPI 1.5.1 before turning to hwloc-bind. Yep. Open MPI 1.5.1 
> does provide the --bycore and --bind-to-core option, but this option seems to 
> bind processes to cores on my machine according to the *physical* indexes:

FWIW, you might want to try one of the OMPI 1.5.2 nightly tarballs -- we 
switched the process affinity stuff to hwloc in 1.5.2 (the 1.5.1 stuff uses a 
different mechanism).

> FYI, my testing environment and application imposes these requirements for 
> optimum performance:
> 
> i. Different binaries optimized for heterogeneous machines. This necessitates 
>  MIMD, and can be done in OMPI using the -app option (providing an 
> application context file).
> ii. The application is communication-sensitive. Thus, fine-grained process 
> mapping on *machines* and on *cores* is required to minimize inter-machine 
> and inter-socket communication costs occurring on the network and on the 
> system bus. Specifically, processes should be mapped onto successive cores of 
> one socket before the next socket is considered, i.e., socket.0:core0-3, then 
> socket.1:core0-3. In this case, the communication among neighboring rank 0-3 
> will be confined to socket 0 without going through the system bus. Same for 
> rank 4-7 on socket 1. As such, the order of the cores should follow the 
> *logical* indexes.

I think that OMPI 1.5.2 should do this for you -- rather than following and 
logical/physical ordering, it does what you describe: traverses successive 
cores on a socket before going to the next socket (which happens to correspond 
to hwloc's logical ordering, but that was not the intent).

FWIW, we have a huge revamp of OMPI's affinity support on the mpirun command 
line that will offer much more flexible binding choices.

> Initially, I tried combining the features of rankfile and appfile, e.g.,
> 
> $ cat rankfile8np4
> rank 0=compute-0-8 slot=0:0
> rank 1=compute-0-8 slot=0:1
> rank 2=compute-0-8 slot=0:2
> rank 3=compute-0-8 slot=0:3
> $ cat rankfile9np4
> rank 0=compute-0-9 slot=0:0
> rank 1=compute-0-9 slot=0:1
> rank 2=compute-0-9 slot=0:2
> rank 3=compute-0-9 slot=0:3
> $ cat my_appfile_rankfile
> --host compute-0-8 -rf rankfile8np4 -np 4 ./test1
> --host compute-0-9 -rf rankfile9np4 -np 4 ./test2
> $ mpirun -app my_appfile_rankfile
> 
> but found out that only the rankfile stated on the first line took effect; 
> the second was ignored completely. After some time of googling and trial and 
> error, I decided to try an external binder, and this direction led me to 
> hwloc-bind.
> 
> Maybe I should bring the issue of rankfile + appfile to the OMPI mailing list.

Yes.  

I'd have to look at it more closely, but it's possible that we only allow one 
rankfile per job -- i.e., that the rankfile should specify all the procs in the 
job, not on a per-host basis.  But perhaps we don't warn/error if multiple 
rankfiles are used; I would consider that a bug.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to