Hmmm...this is a tough one. It basically comes down to what we mean by relative 
locality. Initially, we meant "at what level do these procs share cpus" - 
however, coll/ml is using it as "at what level are these procs commonly bound". 
Subtle difference, but significant.

Your proposed version implements the second interpretation - even though we 
share cpus down to the hwthread level, it correctly reports that we are only 
commonly bound to the node. I'm unclear how the shared memory system (or other 
areas using that value) will respond to that change in meaning.

Probably requires looking a little more broadly (just search the ompi layer for 
anything referencing the ompi_proc_t locality flag) to ensure everything can 
handle (or be adjusted to handle) the revised definition. If so, then I have no 
issue with replacing the locality algorithm.

Would also require an RFC as that might impact folks working on branches.


On Jun 19, 2014, at 11:52 PM, Gilles Gouaillardet 
<gilles.gouaillar...@iferc.org> wrote:

> Ralph,
> 
> Here is attached a patch that fixes/works around my issue.
> this is more of a proof of concept, so i did not commit it to the trunk.
> 
> basically :
> 
> opal_hwloc_base_get_relative_locality (topo, set1, set2)
> sets the locality based on the deepest element that is part of both set1 and 
> set2.
> in my case, set2 means "all the available cpus" that is why the subroutine
> will return OPAL_PROC_ON_HWTHREAD
> 
> the patch uses opal_hwloc_base_get_relative_locality2 instead.
> if one of the cpuset means "all the available cpus", then the subroutine will
> simply return OPAL_PROC_ON_NODE.
> 
> i am puzzled wether this is a bug in opal_hwloc_base_get_relative_locality
> or in proc.c that should not call this subroutine because it does not do what
> should be expected.
> 
> Cheers,
> 
> Gilles
> 
> On 2014/06/20 13:59, Gilles Gouaillardet wrote:
>> Ralph,
>> 
>> my test VM is single socket four cores.
>> here is something odd i just found when running mpirun -np 2
>> intercomm_create.
>> tasks [0,1] are bound on cpus [0,1] => OK
>> tasks[2-3] (first spawn) are bound on cpus [2,3] => OK
>> tasks[4-5] (second spawn) are not bound (and cpuset is [0-3]) => OK
>> 
>> in ompi_proc_set_locality (ompi/proc/proc.c:228) on task 0
>>                    locality =
>> opal_hwloc_base_get_relative_locality(opal_hwloc_topology,
>> 
>> ompi_process_info.cpuset,
>> 
>> cpu_bitmap);
>> where
>> ompi_process_info.cpuset is "0"
>> cpu_bitmap is "0-3"
>> 
>> and locality is set to OPAL_PROC_ON_HWTHREAD (!)
>> 
>> is this correct ?
>> 
>> i would have expected OPAL_PROC_ON_L2CACHE (since there is a single L2
>> cache on my vm,
>> as reported by lstopo) or even OPAL_PROC_LOCALITY_UNKNOWN
>> 
>> then in mca_coll_ml_comm_query (ompi/mca/coll/ml/coll_ml_module.c:2899)
>> the module
>> disqualifies itself if !ompi_rte_proc_bound.
>> if locality were previously set to OPAL_PROC_LOCALITY_UNKNOWN, coll/ml
>> could checked the flag
>> of all the procs of the communicator and disqualify itself if at least
>> one of them is OPAL_PROC_LOCALITY_UNKNOWN.
>> 
>> 
>> as you wrote, there might be a bunch of other corner cases.
>> that being said, i ll try to write a simple proof of concept and see it
>> this specific hang can be avoided
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 2014/06/20 12:08, Ralph Castain wrote:
>>> It is related, but it means that coll/ml has a higher degree of sensitivity 
>>> to the binding pattern than what you reported (which was that coll/ml 
>>> doesn't work with unbound processes). What we are now seeing is that 
>>> coll/ml also doesn't work when processes are bound across sockets.
>>> 
>>> Which means that Nathan's revised tests are going to have to cover a lot 
>>> more corner cases. Our locality flags don't currently include 
>>> "bound-to-multiple-sockets", and I'm not sure how he is going to easily 
>>> resolve that case.
>>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/06/15036.php
> 
> <proc.patch>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/15037.php

Reply via email to