Well, that’s a new one! I imagine we could modify the logic to allow a combination of oversubscribe and overload flags. Won’t get out until 2.1, though you could pull the patch in advance if it is holding you up.
> On Aug 23, 2016, at 11:46 PM, Ben Menadue <ben.mena...@nci.org.au> wrote: > > Hi, > > One of our users has noticed that binding is disabled in 2.0.0 when > --oversubscribe is passed, which is hurting their performance, likely > through migrations between sockets. It looks to be because of 294793c > (PR#1228). > > They need to use --oversubscribe as for some reason the developers decided > to run two processes for each MPI task for some reason (a compute process > and an I/O worker process, I think). Since the second process in the pair is > mostly idle, there's (almost) no harm in launching two processes per core - > and it's better than leaving half the cores idle most of the time. In > previous versions they were binding each pair to a core and letting the > hyper-threads argue over which of the two processes to run, since this gave > the best performance. > > I tried creating a rankfile and binding each process to its own hardware > thread, but it refuses to launch more processes than the number of cores > (even if all these processes are on the first socket because of the binding) > unless --oversubscribe is passed, and thus disabling the binding. Is there a > way of bypassing the disable-binding-if-oversubscribing check introduced by > that commit? Or can anyone think of a better way of running this program? > > Alternatively, they could leave it with no binding at the mpirun level and > do the binding in a wrapper. > > Thanks, > Ben > > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel