Yes I still have that in mind, but I am not a big fan of adding yet two
other distribute functions. It's a pitty that we need 4 of them to get
everything done. If we multiply by 2 every 2 years, it won't scale :)
The current interface doesn't well support distributing N objects among
M child when M > N, that's causing your request (fixing this +
singlify_last would be enough), and that may cause other requests in the
future. But that's hard to fix properly.
I would rather deprecate the existing functions, and make a new single
one (needs a new name) that won't need yet another fix in two years. At
least, it'll take a flag argument (reverse would be one of this flag).

singlify_reverse() (or _last() or whatever good name) is ok (just needs
some additional tests for make check).

Brice




Le 18/09/2013 14:42, Jiri Hladky a écrit :
> Hi Brice,
> hi Samuel,
>
> hopefully you still remember discussion we had regarding a proposed
> new option 
> --reverse_direction
> or perhaps better
> --start_from_last_object
>
> which would force hwloc-distrib to start assigning PUs from the last
> object (rather than from PU#0 as the default)
>
> I have now implemented the change on the latest trunk. 
>
> svn co http://svn.open-mpi.org/svn/hwloc/trunk hwloc-trunk
>
> I had to modify only three functions
>
>   * hwloc_distributev in helper.h
>   * hwloc_distribute in helper.h
>   * hwloc_bitmap_singlify in bitmap.c
>
> The change is trivial. In helper.h I had to only change the direction
> of loops, ie from
>
> for(i=0;i<N;++i) {
>   anything[i] = 
> to
> for(i=N;i>0;--i)
>   anything[i-1] = 
>
> (Please be careful here as "i" isunsigned int and it will wrap around
> when doing i-- for i == 0 . Alternative code would be
> |for (size_t i = n-1; i < n ; --i) { ... }|
> |which has the benefit that you don't need to change i to i-1 in the loop 
> body but it's for statement is less readable)|
> In hwloc_bitmap_singlify I had to
>
>   * again change to loop direction
>   * replace 
>
> int _ffs = hwloc_ffsl(w);
> set->ulongs[i] = HWLOC_SUBBITMAP_CPU(_ffs-1);
> with
> int _fls = hwloc_flsl(w);
> set->ulongs[i-1] = HWLOC_SUBBITMAP_CPU(_fls-1);
>
> The patch file is attached. 
>
> Could you please review it if you agree to add it to the 1.8?
>
> Other changes would require:
>
>   * add the new command line option to hwloc-distrib
>   * clone
>     functions hwloc_distributev, hwloc_distribute, hwloc_bitmap_singlify
>     to hwloc_distributev_reverse, hwloc_distribute_reverse, 
> hwloc_bitmap_singlify_reverse
>     and when a command line option is specified, call these new functions
>
> I have tested the new code on 4 Socket system with 15 cores per socket
> and 30 PUs per socket and it works as expected:
>
> utils/hwloc-distrib --single --taskset 64
> 0x800000000000000000000000000000
> 0x400000000000000000000000000000
> 0x100000000000000000000000000
> 0x20000000000000000000000
> 0x4000000000000000000
>
> It will assign the jobs in this order
>
>  1. last PU on the last core on Socket #4
>  2. last PU on the last core Socket #3
>  3. last PU on the last core Socket #2
>  4. last PU on the last core Socket #1
>  5. last PU on the second last core on Socket #4
>
> Please let me know what you think about it. I would really appreciate
> if you can include it into the official code.
>
> Thanks a lot!
> Jirka
>
>
>
>
> On Thu, Aug 29, 2013 at 9:58 AM, Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>> wrote:
>
>     Le 28/08/2013 16:20, Jiri Hladky a écrit :
>>     I got your point. On the other hand I think that hwloc-distrib is
>>     at the moment not flexible enough to handle such case. I believe
>>     that the current strategy - start from the first object - is not
>>     the best one. From my experience, core 0 is always most used by
>>     the system so it seems that better strategy would to allocate the
>>     cores from the last one.
>
>     Most people expect their jobs to be launched in order: process0 on
>     first cores, process1 on next cores, etc.
>
>     Again, you don't want to reverse the output, you want to ignore
>     first core/socket if possible.
>
>
>>     I was looking at the source code of the hwloc-distrib and I
>>     believe that only this part of the code would be affected:
>>
>>           for (i = 0; i < chunks; i++)
>>             roots[i] = hwloc_get_obj_by_depth(topology, from_depth,
>>     i);  => change this to roots[i] =
>>     hwloc_get_obj_by_depth(topology, from_depth, MAX_COUNT - i);
>>
>>           hwloc_distributev(topology, roots, chunks, cpuset, n,
>>     to_depth); => rewrite this to iterate in the reverse direction
>
>     You can do the exact same thing by reversing your loop in the caller.
>
>     Anyway, reversing the loop just move the core you don't want to
>     the end of the list. But if you use the entire list, you end up
>     using the exact same cores. You just reordered the processes among
>     these cores.
>
>
>>     Am I missing something? In case of infinite bitmap hwloc-distrib
>>     will error out. This should solve the problems
>>     with hwloc_bitmap_singlify.
>
>     We need a new singlify() above, one that doesn't use the first
>     bit. That's what will make you use a core that is not the first
>     one in each socket.
>
>     Problems with infinite bitmaps are unrelated with hwloc-distrib,
>     but they need to be handled in that new bitmap API anyway.
>
>     Brice
>
>
>     _______________________________________________
>     hwloc-devel mailing list
>     hwloc-de...@open-mpi.org <mailto:hwloc-de...@open-mpi.org>
>     http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>
>

Reply via email to