Hmmm...bet I know why. Let me poke a bit. > On Aug 24, 2016, at 5:18 PM, Ben Menadue <ben.mena...@nci.org.au> wrote: > > Actually, adding :oversubscribe to the --map-by option still disables > binding, even with :overload on the --bind-to option. While the :overload > option allows binding more than one process per CPU, it only has an effect if > binding actually happens - i.e. without :oversubscribe. > > So, on one of our login nodes (2x8-core), > > mpirun --np 32 --bind-to core:overload --report-bindings true > > works and does what you would expect (0 and 16 on core 0, 1 and 17 on core 1, > ...), while inside a PBS job on a compute node (same hardware) it fails with > "not enough slots available in the system". Adding --map-by > core:oversubscribe makes this to work, but then doesn't have binding. > > Cheers, > Ben > > -----Original Message----- > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Ben Menadue > Sent: Thursday, 25 August 2016 9:36 AM > To: 'Open MPI Developers' <devel@lists.open-mpi.org> > Subject: Re: [OMPI devel] Binding with --oversubscribe in 2.0.0 > > Hi Ralph, > > Thanks for that... that option's not on the man page for mpirun, but I can > see it in the --help message (as "overload-allowed", which also works). > > Cheers, > Ben > > > -----Original Message----- > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of > r...@open-mpi.org > Sent: Thursday, 25 August 2016 2:03 AM > To: OpenMPI Devel <devel@lists.open-mpi.org> > Subject: Re: [OMPI devel] Binding with --oversubscribe in 2.0.0 > > Actually, I stand corrected! Someone must have previously requested it, > because support already exists. > > What you need to do is simply specify the desired binding. If you don’t > specify one, then we will disable it by default when oversubscribed. This was > done to protect performance for those who don’t have such kind scenarios, and > don’t realize we are otherwise binding by default. > > So in your case, you’d want something like: > > mpirun --map-by core:oversubscribe --bind-to core:overload > > HTH > Ralph > >> On Aug 24, 2016, at 7:33 AM, r...@open-mpi.org wrote: >> >> Well, that’s a new one! I imagine we could modify the logic to allow a >> combination of oversubscribe and overload flags. Won’t get out until 2.1, >> though you could pull the patch in advance if it is holding you up. >> >> >>> On Aug 23, 2016, at 11:46 PM, Ben Menadue <ben.mena...@nci.org.au> wrote: >>> >>> Hi, >>> >>> One of our users has noticed that binding is disabled in 2.0.0 when >>> --oversubscribe is passed, which is hurting their performance, likely >>> through migrations between sockets. It looks to be because of 294793c >>> (PR#1228). >>> >>> They need to use --oversubscribe as for some reason the developers >>> decided to run two processes for each MPI task for some reason (a >>> compute process and an I/O worker process, I think). Since the second >>> process in the pair is mostly idle, there's (almost) no harm in >>> launching two processes per core - and it's better than leaving half >>> the cores idle most of the time. In previous versions they were >>> binding each pair to a core and letting the hyper-threads argue over >>> which of the two processes to run, since this gave the best performance. >>> >>> I tried creating a rankfile and binding each process to its own >>> hardware thread, but it refuses to launch more processes than the >>> number of cores (even if all these processes are on the first socket >>> because of the binding) unless --oversubscribe is passed, and thus >>> disabling the binding. Is there a way of bypassing the >>> disable-binding-if-oversubscribing check introduced by that commit? Or can >>> anyone think of a better way of running this program? >>> >>> Alternatively, they could leave it with no binding at the mpirun >>> level and do the binding in a wrapper. >>> >>> Thanks, >>> Ben >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel