FWIW: speaking just to the —map-by node issue, Josh Ladd reported the problem on master as well yesterday. I’ll be looking into it on Wed.
> On Apr 12, 2016, at 5:53 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > > > > On Wed, Apr 13, 2016 at 1:59 AM, Gilles Gouaillardet <gil...@rist.or.jp > <mailto:gil...@rist.or.jp>> wrote: > George, > > about the process binding part > > On 4/13/2016 7:32 AM, George Bosilca wrote: > Also my processes, despite the fact that I asked for 1 per node, are not > bound to the first core. Shouldn’t we release the process binding when we > know there is a single process per node (as in the above case) ? > did you expect the tasks are bound to the first *core* on each node ? > > i would expect the tasks are bound to the first *socket* on each node. > > In this particular instance, where it has been explicitly requested to have a > single process per node, I would have expected the process to be unbound (we > know there is only one per node). It is the responsibility of the application > to bound itself or its thread if necessary. Why are we enforcing a particular > binding policy? > > (since we do not know how many (OpenMP or other) threads will be used by the > application, > --bind-to socket is a good policy imho. in this case (one task per node), no > binding at all would mean > the task can migrate from one socket to the other, and/or OpenMP threads are > bound accross sockets. > That would trigger some NUMA effects (better bandwidth if memory is locally > accessed, but worst performance > is memory is allocated only on one socket). > so imho, --bind-to socket is still my preferred policy, even if there is only > one MPI task per node. > > Open MPI is about MPI ranks/processes. I don't think it is our job to try to > figure out how the user handle do with it's own threads. > > Your justification make sense if the application only uses a single socket. > It also make sense if one starts multiple ranks per node, and the internal > threads of each MPI process inherit the MPI process binding. However, in the > case where there is a single process per node, because there is a mismatch > between the number of resources available (hardware threads) and the binding > of the parent process, all the threads of the MPI application are [by > default] bound on a single socket. > > George. > > PS: That being said I think I'll need to implement the binding code anyway in > order to deal with the wide variety of behaviors in the different MPI > implementations. > > > > Cheers, > > Gilles > _______________________________________________ > devel mailing list > de...@open-mpi.org <mailto:de...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > <http://www.open-mpi.org/mailman/listinfo.cgi/devel> > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/04/18758.php > <http://www.open-mpi.org/community/lists/devel/2016/04/18758.php> > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/04/18759.php