Re: [OMPI devel] Assigning processes to cores 1.4.2, 1.6.4 and 1.8.4

Ralph Castain Fri, 10 Apr 2015 09:55:02 -0400 (EDT)

Actually, I believe from the cmd line that the questioner wanted each process 
to be bound to a single core.


From your output, I’m guessing you have hyperthreads enabled on your system - 
yes? In that case, the 1.4 series is likely to be binding each process to a 
single HT because it isn’t sophisticated enough to realize the difference 
between HT and core.

Later versions of OMPI do know the difference. When you tell OMPI to bind to 
core, it will bind you to -both- HTs of that core. Hence the output you showed 
here:

> here is the map using just --mca mpi_paffinity_alone 1
> 
>   PID COMMAND         CPUMASK     TOTAL [     N0     N1     N2     N3     N4  
>    N5 ]
> 25846 prog1              0,16     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25847 prog1              2,18     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25848 prog1              4,20     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25849 prog1              6,22     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25850 prog1              8,24     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25851 prog1             10,26     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25852 prog1             12,28     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25853 prog1             14,30     60.6M [  60.6M     0      0      0      0   
>    0  ]


When you tell us bind-to socket, we bind you to every HT in that socket. Since 
you are running less than 8 processes, and we map-by core by default, all the 
processes are bound to the first socket. This is what you show in this output:

> We get the following process map (this output is with mpirun args 
> --bind-to-socket
> --mca mpi_paffinity_alone 1):
> 
>   PID COMMAND         CPUMASK     TOTAL [     N0     N1     N2     N3     N4  
>    N5 ]
> 24176 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.2M [  
> 60.2M     0      0      0      0      0  ]
> 24177 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24178 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24179 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24180 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24181 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24182 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24183 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]


So it looks to me like OMPI is doing exactly what you requested. I admit the HT 
numbering in the cpumask is strange, but that’s the way your BIOS numbered them.

HTH
Ralph


> On Apr 10, 2015, at 6:29 AM, Nick Papior Andersen <nickpap...@gmail.com> 
> wrote:
> 
> Bug, it should be "span,pe=2"
> 
> 2015-04-10 15:28 GMT+02:00 Nick Papior Andersen <nickpap...@gmail.com 
> <mailto:nickpap...@gmail.com>>:
> I guess you want process #1 to have core 0 and core 1 bound to it, process #2 
> have core 2 and core 3 bound?
> 
> I can do this with (I do this with 1.8.4, I do not think it works with 1.6.x):
> --map-by ppr:4:socket:span:pe=2
> ppr = processes per resource.
> socket = the resource
> span = load balance the processes
> pe = bind processing elements to each process
> 
> This should launch 8 processes (you have 2 sockets). Each process should have 
> 2 processing elements bound to it.
> You can check with --report-bindings to see the "bound" processes bindings.
> 
> 2015-04-10 15:16 GMT+02:00  <twu...@goodyear.com 
> <mailto:twu...@goodyear.com>>:
> 
> We can't seem to get "processor affinity" using 1.6.4 or newer OpenMPI.
> 
> Note this is a 2 socket machine with 8 cores per socket
> 
> We had compiled OpenMPI 1.4.2 with the following configure options:
> 
> ===========================================================================
> export CC=/apps/share/intel/v14.0.4.211/bin/icc
> export CXX=/apps/share/intel/v14.0.4.211/bin/icpc
> export FC=/apps/share/intel/v14.0.4.211/bin/ifort
> 
> version=1.4.2.I1404211
> 
> ./configure \
>     --prefix=/apps/share/openmpi/$version \
>     --disable-shared \
>     --enable-static \
>     --enable-shared=no \
>     --with-openib \
>     --with-libnuma=/usr \
>     --enable-mpirun-prefix-by-default \
>     --with-memory-manager=none \
>     --with-tm=/apps/share/TORQUE/current/Linux
> ===========================================================================
> 
> and then used this mpirun command (where we used 8 cores):
> 
> ===========================================================================
> /apps/share/openmpi/1.4.2.I1404211/bin/mpirun \
> --prefix /apps/share/openmpi/1.4.2.I1404211 \
> --mca mpi_paffinity_alone 1 \
> --mca btl openib,tcp,sm,self \
> --x LD_LIBRARY_PATH \
> {model args}
> ===========================================================================
> 
> And when we checked the process map, it looks like this:
> 
>   PID COMMAND         CPUMASK     TOTAL [     N0     N1     N2     N3     N4  
>    N5 ]
> 22232 prog1                 0    469.9M [ 469.9M     0      0      0      0   
>    0  ]
> 22233 prog1                 1    479.0M [   4.0M 475.0M     0      0      0   
>    0  ]
> 22234 prog1                 2    516.7M [ 516.7M     0      0      0      0   
>    0  ]
> 22235 prog1                 3    485.4M [   8.0M 477.4M     0      0      0   
>    0  ]
> 22236 prog1                 4    482.6M [ 482.6M     0      0      0      0   
>    0  ]
> 22237 prog1                 5    486.6M [   6.0M 480.6M     0      0      0   
>    0  ]
> 22238 prog1                 6    481.3M [ 481.3M     0      0      0      0   
>    0  ]
> 22239 prog1                 7    419.4M [   8.0M 411.4M     0      0      0   
>    0  ]
> 
> Now with 1.6.4 and higher, we did the following:
> ===========================================================================
> export CC=/apps/share/intel/v14.0.4.211/bin/icc
> export CXX=/apps/share/intel/v14.0.4.211/bin/icpc
> export FC=/apps/share/intel/v14.0.4.211/bin/ifort
> 
> version=1.6.4.I1404211
> 
> ./configure \
>     --disable-vt \
>     --prefix=/apps/share/openmpi/$version \
>     --disable-shared \
>     --enable-static \
>     --with-verbs \
>     --enable-mpirun-prefix-by-default \
>     --with-memory-manager=none \
>     --with-hwloc \
>     --enable-mpi-ext \
>     --with-tm=/apps/share/TORQUE/current/Linux
> ===========================================================================
> 
> We've tried the same mpirun command, with -bind-to-core, with -bind-to-core 
> -bycore etc
> and I can't seem to get the right combination of args to get the same 
> behavior as 1.4.2.
> 
> We get the following process map (this output is with mpirun args 
> --bind-to-socket
> --mca mpi_paffinity_alone 1):
> 
>   PID COMMAND         CPUMASK     TOTAL [     N0     N1     N2     N3     N4  
>    N5 ]
> 24176 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.2M [  
> 60.2M     0      0      0      0      0  ]
> 24177 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24178 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24179 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24180 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24181 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24182 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 24183 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30     60.5M [  
> 60.5M     0      0      0      0      0  ]
> 
> here is the map using just --mca mpi_paffinity_alone 1
> 
>   PID COMMAND         CPUMASK     TOTAL [     N0     N1     N2     N3     N4  
>    N5 ]
> 25846 prog1              0,16     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25847 prog1              2,18     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25848 prog1              4,20     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25849 prog1              6,22     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25850 prog1              8,24     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25851 prog1             10,26     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25852 prog1             12,28     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 25853 prog1             14,30     60.6M [  60.6M     0      0      0      0   
>    0  ]
> 
> I figure I am compiling incorrectly or using the wrong mpirun args.
> 
> Can someone tell me how to duplicate the behavior of 1.4.2 regarding binding 
> the processes to cores?
> 
> Any help appreciated..
> 
> thanks
> 
> tom
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/04/17205.php 
> <http://www.open-mpi.org/community/lists/devel/2015/04/17205.php>
> 
> 
> 
> -- 
> Kind regards Nick
> 
> 
> 
> -- 
> Kind regards Nick
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/04/17207.php 
> <http://www.open-mpi.org/community/lists/devel/2015/04/17207.php>

Re: [OMPI devel] Assigning processes to cores 1.4.2, 1.6.4 and 1.8.4

Reply via email to