Hi Brice, First, my apologies if this email starts a new thread. For some reason I never received your response through Mailman and can only see it through the web archive interface. I'm constructing this reponse without things like "In-Reply-To".
Thank you for your very helpful response. I'll use your explanation of the algorithm and try to understand the implementation. I was indeed expecting expecting hwloc-distrib to help me to bind multithreaded processes, although I certainly can understand that this is considered a corner case. Could you please consider fixing this? Thanks, Tim Brice Goglin wrote: > Hello, > > This is the main corner case of hwloc-distrib. It can return objects > only, not groups of objects. The distrib algorithms is: > 1) start at the root, where there are M children, and you have to > distribute N processes > 2) if there are no children, or if N is 1, return the entire object > 3) split N into Ni (N = sum of Ni) into M pieces based on each children > weight (the number of PUs under each) > If N>=M, all Ni can be > 0, all children will get some process > if N<M, you can't split N into M integer pieces, some Ni will be 0, > these objects won't get any process > 4) go back to (2) recurse in each children object with Ni instead of N > > Your case is step 3 with N=2 and M=4. It basically means that we > distribute across cores without "assembling group of cores if needed". > > In your case, when you bind to 2 cores of 4 PUs each, your task only > uses one PU in the end, 1 core and 3 PU are ignored as well. They *may* > be used, but the operating system scheduler is free to ignore them. So > binding to 2 cores or binding to 1 core or binding to 1 PU is almost > equivalent. At least the latter is included in the former. And most > people pass --single to get a single PU anyway. > > The case where it's not equivalent is when you bind multithreaded > processes. If you have 8 threads, it's better to use 2 cores than 1 > single one. If this case matters to you, I will look into fixing this > corner case. > > Brice > > Le 30/03/2014 07:56, Tim Creech a écrit : > > Hello, > > I would like to use hwloc_distrib for a project, but I'm having some > > trouble understanding how it distributes. Specifically, it seems to > > avoid distributing multiple processes across cores, and I'm not sure > > why. > > > > As an example, consider the actual output of: > > > > $ hwloc-distrib -i "4 4" 2 > > 0x0000000f > > 0x000000f0 > > > > I'm expecting hwloc-distrib to tell me how to distribute 2 processes > > across the 16 PUs (4 cores by 4 PUs), but the answer only involves 8 > > PUs, leaving the other 8 unused. If there were more cores on the > > machine, then potentially the vast majority of them would be unused. > > > > In other words, I might expect the output to use all of the PUs across > > cores, for example: > > > > $ hwloc-distrib -i "4 4" 2 > > 0x000000ff > > 0x0000ff00 > > > > Why does hwloc-distrib leave PUs unused? I'm using hwloc-1.9. Any help > > in understanding where I'm going wrong is greatly appreciated! > > > > Thanks, > > Tim > > > > _______________________________________________ > > hwloc-users mailing list > > hwloc-users_at_[hidden] > > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users