Le 13/02/2011 04:54, Siew Yin Chan a écrit : > Good day, > > I'm studying the impact of MPI process binding on communication costs > in my project, and would like to use hwloc-bind to achieve > fine-grained mapping control. I install hwloc 1.1.1 on a 2-socket > 4-core machine (with 2 dual-core dies in each socket), and run > hwloc-ps to verify the binding: > > $ mpirun -V > mpirun (Open MPI) 1.5.1 > $ mpirun -np 4 hwloc-bind socket:0.core:0-3 ./test > > hwloc-ps shows the following output: > > $ hwloc-ps -p > 1497 Socket:0 ./test > 1498 Socket:0 ./test > 1499 Socket:0 ./test > 1500 Socket:0 ./test > $ hwloc-ps -l > 1497 Socket:0 ./test > 1498 Socket:0 ./test > 1499 Socket:0 ./test > 1500 Socket:0 ./test > $ hwloc-ps -c > 1497 0x00000055 ./test > 1498 0x00000055 ./test > 1499 0x00000055 ./test > 1500 0x00000055 ./test > > > Questions: > 1. Does hwloc-bind map the processes *sequentially* on *successive* > cores of the socket? >
Hello, No. Each hwloc-bind command in the mpirun above doesn't know that there are other hwloc-bind instances on the same machine. All of them bind their process to all cores in the first socket. > 2. How could hwloc-ps help verify this binding, i.e., > > process 1497 (rank 0) on socket.0:core.0 > process 1498 (rank 1) on socket.0:core.1 > process 1499 (rank 2) on socket.0:core.2 > process 1500 (rank 3) on socket.0:core.3 > (let's assume your mpirun command did what you want) You would get something like this from hwloc-ps: 1497 Core:0 ./test 1498 Core:1 ./test 1499 Core:2 ./test 1500 Core:0 ./test These core numbers are the logical core number among the entire machine. hwloc-ps can't easily show hierarchical location such as socket.core since there are many possible combinations, especially because of caches. Actually, you might get L1Cache instead of Core above since hwloc-ps reports the first object that exactly matches the process binding (and L1 is above but equal to Core in your machine). If you want to get other output, I suggest you use hwloc-calc to convert the hwloc-ps output. > Equivalently, does the binding of `socket:0.core:0-1 > socket:1.core:0-1' with hwloc-ps showing > > $ hwloc-ps -l > 1315 L2Cache:0 L2Cache:2 ./test > 1316 L2Cache:0 L2Cache:2 ./test > 1317 L2Cache:0 L2Cache:2 ./test > 1318 L2Cache:0 L2Cache:2 ./test > > indicate the the following? I.e., > > process 1315 (rank 0) on socket.0:core.0 > process 1316 (rank 1) on socket.0:core.1 > process 1317 (rank 2) on socket.1:core.0 > process 1318 (rank 3) on socket.1:core.1 > No. Again, all processes are bound to 4 different cores, so hwloc-ps shows the largest objects containing those cores. In the end, you want a MPI launcher that takes care of the binding instead of having to manually bind on the command line. It should be the case with most MPI launchers nowadays. Once this is ok, hwloc-ps will report this exact core where you bound. And you might need to play with hwloc-calc to rewrite the hwloc-ps output as you want. I am thinking of adding an option to hwloc-calc to help rewriting a random string into socket:X.core:Y or something like that. Brice