Hi Samuel, thanks for looking into it! I'm using hwloc_distribute to distribute parallel jobs on multi-socket systems.
Usually, it gives nice results: running hwloc-distrib --single <N> on box with <N> sockets will ditrbitute one job per socket. This is what I want. hwloc-distrib --single <2*N> will distribute 2 jobs per socket, picking-up PU wisely. It breaks however on strange systems. Please check with lstopo --input or hwloc-distrib --input on topology I sent you with my last e-mail (hp-dl980g7-01.tar.bz2, sent on Tuesday 09:30:37 pm) This box has a broken NUMA topology - there are 7 sockets in one NUMA node and 1 socket in another NUMA node. My goal is to distribute one job per Socket with command hwloc-distrib --single 8 This is not working. So I have tried various --among and -ignore options to achieve this but without success. Please try hwloc-distrib --input hp-dl980g7-01 --single 8 with data I sent you on Tuesday (tar jxvf hp-dl980g7-01.tar.bz2). Goal is to distribute one job per one socket. Thanks! Jirka On Tuesday, November 16, 2010 10:20:38 pm Samuel Thibault wrote: > Samuel Thibault, le Tue 16 Nov 2010 22:18:54 +0100, a écrit : > > Also note that currently the hwloc_distribute() function doesn't take > > e.g. the number of PUs into account when splitting elements over the > > hierarchy. It was more a demonstration example than something to be used > > as is. We can however extend it, we just need to know what's desired. > > Reading your mail again, I guess that's where your issue actually lied. > > Samuel > _______________________________________________ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
hp-dl980g7-01.tar.bz2
Description: application/bzip-compressed-tar