Hmmm...bet I know why. Let me poke a bit.

> On Aug 24, 2016, at 5:18 PM, Ben Menadue <ben.mena...@nci.org.au> wrote:
> 
> Actually, adding :oversubscribe to the --map-by option still disables 
> binding, even with :overload on the --bind-to option. While the :overload 
> option allows binding more than one process per CPU, it only has an effect if 
> binding actually happens - i.e. without :oversubscribe.
> 
> So, on one of our login nodes (2x8-core),
> 
>  mpirun --np 32 --bind-to core:overload --report-bindings true
> 
> works and does what you would expect (0 and 16 on core 0, 1 and 17 on core 1, 
> ...), while inside a PBS job on a compute node (same hardware) it fails with 
> "not enough slots available in the system". Adding --map-by 
> core:oversubscribe makes this to work, but then doesn't have binding.
> 
> Cheers,
> Ben
> 
> -----Original Message-----
> From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Ben Menadue
> Sent: Thursday, 25 August 2016 9:36 AM
> To: 'Open MPI Developers' <devel@lists.open-mpi.org>
> Subject: Re: [OMPI devel] Binding with --oversubscribe in 2.0.0
> 
> Hi Ralph,
> 
> Thanks for that... that option's not on the man page for mpirun, but I can 
> see it in the --help message (as "overload-allowed", which also works).
> 
> Cheers,
> Ben
> 
> 
> -----Original Message-----
> From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of 
> r...@open-mpi.org
> Sent: Thursday, 25 August 2016 2:03 AM
> To: OpenMPI Devel <devel@lists.open-mpi.org>
> Subject: Re: [OMPI devel] Binding with --oversubscribe in 2.0.0
> 
> Actually, I stand corrected! Someone must have previously requested it, 
> because support already exists.
> 
> What you need to do is simply specify the desired binding. If you don’t 
> specify one, then we will disable it by default when oversubscribed. This was 
> done to protect performance for those who don’t have such kind scenarios, and 
> don’t realize we are otherwise binding by default.
> 
> So in your case, you’d want something like:
> 
> mpirun --map-by core:oversubscribe --bind-to core:overload
> 
> HTH
> Ralph
> 
>> On Aug 24, 2016, at 7:33 AM, r...@open-mpi.org wrote:
>> 
>> Well, that’s a new one! I imagine we could modify the logic to allow a 
>> combination of oversubscribe and overload flags. Won’t get out until 2.1, 
>> though you could pull the patch in advance if it is holding you up.
>> 
>> 
>>> On Aug 23, 2016, at 11:46 PM, Ben Menadue <ben.mena...@nci.org.au> wrote:
>>> 
>>> Hi,
>>> 
>>> One of our users has noticed that binding is disabled in 2.0.0 when 
>>> --oversubscribe is passed, which is hurting their performance, likely 
>>> through migrations between sockets. It looks to be because of 294793c 
>>> (PR#1228).
>>> 
>>> They need to use --oversubscribe as for some reason the developers 
>>> decided to run two processes for each MPI task for some reason (a 
>>> compute process and an I/O worker process, I think). Since the second 
>>> process in the pair is mostly idle, there's (almost) no harm in 
>>> launching two processes per core - and it's better than leaving half 
>>> the cores idle most of the time. In previous versions they were 
>>> binding each pair to a core and letting the hyper-threads argue over 
>>> which of the two processes to run, since this gave the best performance.
>>> 
>>> I tried creating a rankfile and binding each process to its own 
>>> hardware thread, but it refuses to launch more processes than the 
>>> number of cores (even if all these processes are on the first socket 
>>> because of the binding) unless --oversubscribe is passed, and thus 
>>> disabling the binding. Is there a way of bypassing the 
>>> disable-binding-if-oversubscribing check introduced by that commit? Or can 
>>> anyone think of a better way of running this program?
>>> 
>>> Alternatively, they could leave it with no binding at the mpirun 
>>> level and do the binding in a wrapper.
>>> 
>>> Thanks,
>>> Ben
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to