Yes I still have that in mind, but I am not a big fan of adding yet two other distribute functions. It's a pitty that we need 4 of them to get everything done. If we multiply by 2 every 2 years, it won't scale :) The current interface doesn't well support distributing N objects among M child when M > N, that's causing your request (fixing this + singlify_last would be enough), and that may cause other requests in the future. But that's hard to fix properly. I would rather deprecate the existing functions, and make a new single one (needs a new name) that won't need yet another fix in two years. At least, it'll take a flag argument (reverse would be one of this flag).
singlify_reverse() (or _last() or whatever good name) is ok (just needs some additional tests for make check). Brice Le 18/09/2013 14:42, Jiri Hladky a écrit : > Hi Brice, > hi Samuel, > > hopefully you still remember discussion we had regarding a proposed > new option > --reverse_direction > or perhaps better > --start_from_last_object > > which would force hwloc-distrib to start assigning PUs from the last > object (rather than from PU#0 as the default) > > I have now implemented the change on the latest trunk. > > svn co http://svn.open-mpi.org/svn/hwloc/trunk hwloc-trunk > > I had to modify only three functions > > * hwloc_distributev in helper.h > * hwloc_distribute in helper.h > * hwloc_bitmap_singlify in bitmap.c > > The change is trivial. In helper.h I had to only change the direction > of loops, ie from > > for(i=0;i<N;++i) { > anything[i] = > to > for(i=N;i>0;--i) > anything[i-1] = > > (Please be careful here as "i" isunsigned int and it will wrap around > when doing i-- for i == 0 . Alternative code would be > |for (size_t i = n-1; i < n ; --i) { ... }| > |which has the benefit that you don't need to change i to i-1 in the loop > body but it's for statement is less readable)| > In hwloc_bitmap_singlify I had to > > * again change to loop direction > * replace > > int _ffs = hwloc_ffsl(w); > set->ulongs[i] = HWLOC_SUBBITMAP_CPU(_ffs-1); > with > int _fls = hwloc_flsl(w); > set->ulongs[i-1] = HWLOC_SUBBITMAP_CPU(_fls-1); > > The patch file is attached. > > Could you please review it if you agree to add it to the 1.8? > > Other changes would require: > > * add the new command line option to hwloc-distrib > * clone > functions hwloc_distributev, hwloc_distribute, hwloc_bitmap_singlify > to hwloc_distributev_reverse, hwloc_distribute_reverse, > hwloc_bitmap_singlify_reverse > and when a command line option is specified, call these new functions > > I have tested the new code on 4 Socket system with 15 cores per socket > and 30 PUs per socket and it works as expected: > > utils/hwloc-distrib --single --taskset 64 > 0x800000000000000000000000000000 > 0x400000000000000000000000000000 > 0x100000000000000000000000000 > 0x20000000000000000000000 > 0x4000000000000000000 > > It will assign the jobs in this order > > 1. last PU on the last core on Socket #4 > 2. last PU on the last core Socket #3 > 3. last PU on the last core Socket #2 > 4. last PU on the last core Socket #1 > 5. last PU on the second last core on Socket #4 > > Please let me know what you think about it. I would really appreciate > if you can include it into the official code. > > Thanks a lot! > Jirka > > > > > On Thu, Aug 29, 2013 at 9:58 AM, Brice Goglin <brice.gog...@inria.fr > <mailto:brice.gog...@inria.fr>> wrote: > > Le 28/08/2013 16:20, Jiri Hladky a écrit : >> I got your point. On the other hand I think that hwloc-distrib is >> at the moment not flexible enough to handle such case. I believe >> that the current strategy - start from the first object - is not >> the best one. From my experience, core 0 is always most used by >> the system so it seems that better strategy would to allocate the >> cores from the last one. > > Most people expect their jobs to be launched in order: process0 on > first cores, process1 on next cores, etc. > > Again, you don't want to reverse the output, you want to ignore > first core/socket if possible. > > >> I was looking at the source code of the hwloc-distrib and I >> believe that only this part of the code would be affected: >> >> for (i = 0; i < chunks; i++) >> roots[i] = hwloc_get_obj_by_depth(topology, from_depth, >> i); => change this to roots[i] = >> hwloc_get_obj_by_depth(topology, from_depth, MAX_COUNT - i); >> >> hwloc_distributev(topology, roots, chunks, cpuset, n, >> to_depth); => rewrite this to iterate in the reverse direction > > You can do the exact same thing by reversing your loop in the caller. > > Anyway, reversing the loop just move the core you don't want to > the end of the list. But if you use the entire list, you end up > using the exact same cores. You just reordered the processes among > these cores. > > >> Am I missing something? In case of infinite bitmap hwloc-distrib >> will error out. This should solve the problems >> with hwloc_bitmap_singlify. > > We need a new singlify() above, one that doesn't use the first > bit. That's what will make you use a core that is not the first > one in each socket. > > Problems with infinite bitmaps are unrelated with hwloc-distrib, > but they need to be handled in that new bitmap API anyway. > > Brice > > > _______________________________________________ > hwloc-devel mailing list > hwloc-de...@open-mpi.org <mailto:hwloc-de...@open-mpi.org> > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel > >