I discovered "dplace" today. I don't know how many people install/use it on their cluster, but it's something that looks interesting when you don't have advanced binding capabilities in the MPI implementation. For instance, you could do: $ mpirun -np 8 dplace 0,4,2,6,1,5,3,7 myprogram to bind process ranks according to the machine topology.
hwloc-calc can easily generate such list of physical processors, for instance: $ hwloc-calc --physical proc:all --pulist 0,4,2,6,1,5,3,7 or even restrict of one PU per socket with: $ hwloc-calc --physical socket:all.core:0 --pulist 0,1 So hwloc-calc could help dplace significantly. Maybe we should put such examples somewhere in the doc. Brice