Re: [OMPI devel] Process placement

Aurélien Bouteiller Thu, 5 May 2016 16:54:53 -0400 (EDT)

Ralph, 

I still observe these issues in the current master. (npernode is not respected 
either).


Also note that the display_allocation seems to be wrong (slots_inuse=0 when the 
slot is obviously in use). 

$ git show 
4899c89 (HEAD -> master, origin/master, origin/HEAD) Fix a race condition when 
multiple threads try to create a bml en....Bouteiller  6 hours ago

$ bin/mpirun -np 12 -hostfile /opt/etc/ib10g.machinefile.ompi 
-display-allocation -map-by node    hostname 

======================   ALLOCATED NODES   ======================
        dancer00: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer01: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer02: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer03: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer04: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer05: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer06: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer07: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer08: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer09: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer10: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer11: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer12: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer13: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer14: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
        dancer15: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
dancer01
dancer00
dancer01
dancer01
dancer01
dancer00
dancer00
dancer00
dancer00
dancer00
dancer00
dancer00


--
Aurélien Bouteiller, Ph.D. ~~ https://icl.cs.utk.edu/~bouteill/ 
<https://icl.cs.utk.edu/~bouteill/>
> Le 13 avr. 2016 à 13:38, Ralph Castain <r...@open-mpi.org> a écrit :
> 
> The —map-by node option should now be fixed on master, and PRs waiting for 
> 1.10 and 2.0
> 
> Thx!
> 
>> On Apr 12, 2016, at 6:45 PM, Ralph Castain <r...@open-mpi.org 
>> <mailto:r...@open-mpi.org>> wrote:
>> 
>> FWIW: speaking just to the —map-by node issue, Josh Ladd reported the 
>> problem on master as well yesterday. I’ll be looking into it on Wed.
>> 
>>> On Apr 12, 2016, at 5:53 PM, George Bosilca <bosi...@icl.utk.edu 
>>> <mailto:bosi...@icl.utk.edu>> wrote:
>>> 
>>> 
>>> 
>>> On Wed, Apr 13, 2016 at 1:59 AM, Gilles Gouaillardet <gil...@rist.or.jp 
>>> <mailto:gil...@rist.or.jp>> wrote:
>>> George,
>>> 
>>> about the process binding part
>>> 
>>> On 4/13/2016 7:32 AM, George Bosilca wrote:
>>> Also my processes, despite the fact that I asked for 1 per node, are not 
>>> bound to the first core. Shouldn’t we release the process binding when we 
>>> know there is a single process per node (as in the above case) ?
>>> did you expect the tasks are bound to the first *core* on each node ?
>>> 
>>> i would expect the tasks are bound to the first *socket* on each node.
>>> 
>>> In this particular instance, where it has been explicitly requested to have 
>>> a single process per node, I would have expected the process to be unbound 
>>> (we know there is only one per node). It is the responsibility of the 
>>> application to bound itself or its thread if necessary. Why are we 
>>> enforcing a particular binding policy?
>>> 
>>> (since we do not know how many (OpenMP or other) threads will be used by 
>>> the application, 
>>> --bind-to socket is a good policy imho. in this case (one task per node), 
>>> no binding at all would mean
>>> the task can migrate from one socket to the other, and/or OpenMP threads 
>>> are bound accross sockets.
>>> That would trigger some NUMA effects (better bandwidth if memory is locally 
>>> accessed, but worst performance
>>> is memory is allocated only on one socket).
>>> so imho, --bind-to socket is still my preferred policy, even if there is 
>>> only one MPI task per node.
>>> 
>>> Open MPI is about MPI ranks/processes. I don't think it is our job to try 
>>> to figure out how the user handle do with it's own threads.
>>> 
>>> Your justification make sense if the application only uses a single socket. 
>>> It also make sense if one starts multiple ranks per node, and the internal 
>>> threads of each MPI process inherit the MPI process binding. However, in 
>>> the case where there is a single process per node, because there is a 
>>> mismatch between the number of resources available (hardware threads) and 
>>> the binding of the parent process, all the threads of the MPI application 
>>> are [by default] bound on a single socket.
>>> 
>>>  George.
>>> 
>>> PS: That being said I think I'll need to implement the binding code anyway 
>>> in order to deal with the wide variety of behaviors in the different MPI 
>>> implementations.
>>> 
>>>  
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2016/04/18758.php 
>>> <http://www.open-mpi.org/community/lists/devel/2016/04/18758.php>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2016/04/18759.php 
>>> <http://www.open-mpi.org/community/lists/devel/2016/04/18759.php>
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/04/18761.php 
> <http://www.open-mpi.org/community/lists/devel/2016/04/18761.php>

smime.p7s
Description: S/MIME cryptographic signature

Re: [OMPI devel] Process placement

Reply via email to