Well, that’s a new one! I imagine we could modify the logic to allow a 
combination of oversubscribe and overload flags. Won’t get out until 2.1, 
though you could pull the patch in advance if it is holding you up.


> On Aug 23, 2016, at 11:46 PM, Ben Menadue <ben.mena...@nci.org.au> wrote:
> 
> Hi,
> 
> One of our users has noticed that binding is disabled in 2.0.0 when
> --oversubscribe is passed, which is hurting their performance, likely
> through migrations between sockets. It looks to be because of 294793c
> (PR#1228).
> 
> They need to use --oversubscribe as for some reason the developers decided
> to run two processes for each MPI task for some reason (a compute process
> and an I/O worker process, I think). Since the second process in the pair is
> mostly idle, there's (almost) no harm in launching two processes per core -
> and it's better than leaving half the cores idle most of the time. In
> previous versions they were binding each pair to a core and letting the
> hyper-threads argue over which of the two processes to run, since this gave
> the best performance.
> 
> I tried creating a rankfile and binding each process to its own hardware
> thread, but it refuses to launch more processes than the number of cores
> (even if all these processes are on the first socket because of the binding)
> unless --oversubscribe is passed, and thus disabling the binding. Is there a
> way of bypassing the disable-binding-if-oversubscribing check introduced by
> that commit? Or can anyone think of a better way of running this program?
> 
> Alternatively, they could leave it with no binding at the mpirun level and
> do the binding in a wrapper.
> 
> Thanks,
> Ben
> 
> 
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to