Looks fine to me - CMR filed. Thanks!

On Nov 8, 2011, at 1:01 AM, nadia.derbey wrote:

> Hi,
> 
> In v1.5, when mpirun is called with both the "-bind-to-core" and
> "-npersocket" options, and the npersocket value leads to less procs than
> sockets allocated on one node, we get a segfault
> 
> Testing environment:
> openmpi v1.5
> 2 nodes with 4 8-cores sockets each
> mpirun -n 10 -bind-to-core -npersocket 2
> 
> I was expecting to get:
>   . ranks 0-1 : node 0 - socket 0
>   . ranks 2-3 : node 0 - socket 1
>   . ranks 4-5 : node 0 - socket 2
>   . ranks 6-7 : node 0 - socket 3
>   . ranks 8-9 : node 1 - socket 0
> 
> Instead of that, everything worked fine on node 0, and I got a segfault
> on node 1, with a stack that looks like:
> 
> [derbeyn@berlin18 ~]$ mpirun --host berlin18,berlin26 -n 10
> -bind-to-core -npersocket 2 sleep 900
> [berlin26:21531] *** Process received signal ***
> [berlin26:21531] Signal: Floating point exception (8)
> [berlin26:21531] Signal code: Integer divide-by-zero (1)
> [berlin26:21531] Failing at address: 0x7fed13731d63
> [berlin26:21531] [ 0] /lib64/libpthread.so.0(+0xf490) [0x7fed15327490]
> [berlin26:21531]
> [ 1] 
> /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/openmpi/mca_odls_default.so(+0x2d63) 
> [0x7fed13731d63]
> [berlin26:21531]
> [ 2] 
> /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_odls_base_default_launch_local+0xaf3)
>  [0x7fed15e1fe73]
> [berlin26:21531]
> [ 3] 
> /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/openmpi/mca_odls_default.so(+0x1d10) 
> [0x7fed13730d10]
> [berlin26:21531]
> [ 4] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(+0x3804d)
> [0x7fed15e1004d]
> [berlin26:21531]
> [ 5] 
> /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_daemon_cmd_processor+0x4aa)
>  [0x7fed15e1209a]
> [berlin26:21531]
> [ 6] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(+0x74ee8)
> [0x7fed15e4cee8]
> [berlin26:21531]
> [ 7] 
> /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_daemon+0x8d8) 
> [0x7fed15e0f268]
> [berlin26:21531] [ 8] /home_nfs/derbeyn/DISTS/openmpi-v1.5/bin/orted()
> [0x4008c6]
> [berlin26:21531] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)
> [0x7fed14fa7c9d]
> [berlin26:21531] [10] /home_nfs/derbeyn/DISTS/openmpi-v1.5/bin/orted()
> [0x400799]
> [berlin26:21531] *** End of error message ***
> 
> The reason for this issue is that the npersocket value is taken into
> account during the very first phase of mpirun (rmaps/load_balance) to
> claim the slots on each node:
> npersocket() (in rmaps/load_balance/rmaps_lb.c) claims
>   . 8 slots on node 0 (4 sockets * 2 persocket)
>   . 2 slots on node 1 (10 total ranks - 8 already claimed)
> 
> But when we come to odls_default_fork_local_proc() (in
> odls/default/odls_default_module.c) npersocket is actually recomputed.
> Everything works fine on node 0. But on node 1, we have:
>   . jobdat->policy has both ORTE_BIND_TO_CORE and ORTE_MAPPING_NPERXXX
>   . npersocket is recomputed the following way:
>     npersocket = jobdat->num_local_procs/orte_odls_globals.num_sockets
>                = 2 / 4 = 0
>   . later on, when the starting point is computed:
>     logical_cpu = (lrank % npersocket) * jobdat->cpus_per_rank;
>     we get the divide-by-zero exception.
> 
> The problem comes, in my mind, from the fact we are recomputing the
> npersocket on the local nodes instead of storing it in the jobdat
> structure (as it is done today for the policy, the cpus_per_rank, the
> stride,...).
> Recomputing this value leads either to the segfault I got, or even to
> wrong mappings: if we had had 4 slots claimed on node 1, the result
> would have been 1 rank per socket (since we have 4-sockets nodes)
> instead of 2 ranks on the first 2 sockets.
> 
> The attached patch is a fix proposal implementing my suggestion of
> storing the npersocket into the jobdat.
> 
> This patch applies on v1.5. Waiting for your comments...
> 
> Regards,
> Nadia
> 
> -- 
> Nadia Derbey
> <001_dont_recompute_npersocket_on_local_nodes.patch>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to