Okay, I found the issue and fixed it:

https://github.com/open-mpi/ompi-release/pull/1340

We are very close to v2.0.1 release, so it may not get into that one. Still, 
you are welcome to pull down the patch and locally apply it if it would help.

Ralph

> On Aug 24, 2016, at 5:29 PM, r...@open-mpi.org wrote:
> 
> Hmmm...bet I know why. Let me poke a bit.
> 
>> On Aug 24, 2016, at 5:18 PM, Ben Menadue <ben.mena...@nci.org.au> wrote:
>> 
>> Actually, adding :oversubscribe to the --map-by option still disables 
>> binding, even with :overload on the --bind-to option. While the :overload 
>> option allows binding more than one process per CPU, it only has an effect 
>> if binding actually happens - i.e. without :oversubscribe.
>> 
>> So, on one of our login nodes (2x8-core),
>> 
>> mpirun --np 32 --bind-to core:overload --report-bindings true
>> 
>> works and does what you would expect (0 and 16 on core 0, 1 and 17 on core 
>> 1, ...), while inside a PBS job on a compute node (same hardware) it fails 
>> with "not enough slots available in the system". Adding --map-by 
>> core:oversubscribe makes this to work, but then doesn't have binding.
>> 
>> Cheers,
>> Ben
>> 
>> -----Original Message-----
>> From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Ben 
>> Menadue
>> Sent: Thursday, 25 August 2016 9:36 AM
>> To: 'Open MPI Developers' <devel@lists.open-mpi.org>
>> Subject: Re: [OMPI devel] Binding with --oversubscribe in 2.0.0
>> 
>> Hi Ralph,
>> 
>> Thanks for that... that option's not on the man page for mpirun, but I can 
>> see it in the --help message (as "overload-allowed", which also works).
>> 
>> Cheers,
>> Ben
>> 
>> 
>> -----Original Message-----
>> From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of 
>> r...@open-mpi.org
>> Sent: Thursday, 25 August 2016 2:03 AM
>> To: OpenMPI Devel <devel@lists.open-mpi.org>
>> Subject: Re: [OMPI devel] Binding with --oversubscribe in 2.0.0
>> 
>> Actually, I stand corrected! Someone must have previously requested it, 
>> because support already exists.
>> 
>> What you need to do is simply specify the desired binding. If you don’t 
>> specify one, then we will disable it by default when oversubscribed. This 
>> was done to protect performance for those who don’t have such kind 
>> scenarios, and don’t realize we are otherwise binding by default.
>> 
>> So in your case, you’d want something like:
>> 
>> mpirun --map-by core:oversubscribe --bind-to core:overload
>> 
>> HTH
>> Ralph
>> 
>>> On Aug 24, 2016, at 7:33 AM, r...@open-mpi.org wrote:
>>> 
>>> Well, that’s a new one! I imagine we could modify the logic to allow a 
>>> combination of oversubscribe and overload flags. Won’t get out until 2.1, 
>>> though you could pull the patch in advance if it is holding you up.
>>> 
>>> 
>>>> On Aug 23, 2016, at 11:46 PM, Ben Menadue <ben.mena...@nci.org.au> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> One of our users has noticed that binding is disabled in 2.0.0 when 
>>>> --oversubscribe is passed, which is hurting their performance, likely 
>>>> through migrations between sockets. It looks to be because of 294793c 
>>>> (PR#1228).
>>>> 
>>>> They need to use --oversubscribe as for some reason the developers 
>>>> decided to run two processes for each MPI task for some reason (a 
>>>> compute process and an I/O worker process, I think). Since the second 
>>>> process in the pair is mostly idle, there's (almost) no harm in 
>>>> launching two processes per core - and it's better than leaving half 
>>>> the cores idle most of the time. In previous versions they were 
>>>> binding each pair to a core and letting the hyper-threads argue over 
>>>> which of the two processes to run, since this gave the best performance.
>>>> 
>>>> I tried creating a rankfile and binding each process to its own 
>>>> hardware thread, but it refuses to launch more processes than the 
>>>> number of cores (even if all these processes are on the first socket 
>>>> because of the binding) unless --oversubscribe is passed, and thus 
>>>> disabling the binding. Is there a way of bypassing the 
>>>> disable-binding-if-oversubscribing check introduced by that commit? Or can 
>>>> anyone think of a better way of running this program?
>>>> 
>>>> Alternatively, they could leave it with no binding at the mpirun 
>>>> level and do the binding in a wrapper.
>>>> 
>>>> Thanks,
>>>> Ben
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to