On Feb 14, 2011, at 9:35 AM, Siew Yin Chan wrote: > 1. I tried Open MPI 1.5.1 before turning to hwloc-bind. Yep. Open MPI 1.5.1 > does provide the --bycore and --bind-to-core option, but this option seems to > bind processes to cores on my machine according to the *physical* indexes:
FWIW, you might want to try one of the OMPI 1.5.2 nightly tarballs -- we switched the process affinity stuff to hwloc in 1.5.2 (the 1.5.1 stuff uses a different mechanism). > FYI, my testing environment and application imposes these requirements for > optimum performance: > > i. Different binaries optimized for heterogeneous machines. This necessitates > MIMD, and can be done in OMPI using the -app option (providing an > application context file). > ii. The application is communication-sensitive. Thus, fine-grained process > mapping on *machines* and on *cores* is required to minimize inter-machine > and inter-socket communication costs occurring on the network and on the > system bus. Specifically, processes should be mapped onto successive cores of > one socket before the next socket is considered, i.e., socket.0:core0-3, then > socket.1:core0-3. In this case, the communication among neighboring rank 0-3 > will be confined to socket 0 without going through the system bus. Same for > rank 4-7 on socket 1. As such, the order of the cores should follow the > *logical* indexes. I think that OMPI 1.5.2 should do this for you -- rather than following and logical/physical ordering, it does what you describe: traverses successive cores on a socket before going to the next socket (which happens to correspond to hwloc's logical ordering, but that was not the intent). FWIW, we have a huge revamp of OMPI's affinity support on the mpirun command line that will offer much more flexible binding choices. > Initially, I tried combining the features of rankfile and appfile, e.g., > > $ cat rankfile8np4 > rank 0=compute-0-8 slot=0:0 > rank 1=compute-0-8 slot=0:1 > rank 2=compute-0-8 slot=0:2 > rank 3=compute-0-8 slot=0:3 > $ cat rankfile9np4 > rank 0=compute-0-9 slot=0:0 > rank 1=compute-0-9 slot=0:1 > rank 2=compute-0-9 slot=0:2 > rank 3=compute-0-9 slot=0:3 > $ cat my_appfile_rankfile > --host compute-0-8 -rf rankfile8np4 -np 4 ./test1 > --host compute-0-9 -rf rankfile9np4 -np 4 ./test2 > $ mpirun -app my_appfile_rankfile > > but found out that only the rankfile stated on the first line took effect; > the second was ignored completely. After some time of googling and trial and > error, I decided to try an external binder, and this direction led me to > hwloc-bind. > > Maybe I should bring the issue of rankfile + appfile to the OMPI mailing list. Yes. I'd have to look at it more closely, but it's possible that we only allow one rankfile per job -- i.e., that the rankfile should specify all the procs in the job, not on a per-host basis. But perhaps we don't warn/error if multiple rankfiles are used; I would consider that a bug. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/