Looks fine to me - CMR filed. Thanks! On Nov 8, 2011, at 1:01 AM, nadia.derbey wrote:
> Hi, > > In v1.5, when mpirun is called with both the "-bind-to-core" and > "-npersocket" options, and the npersocket value leads to less procs than > sockets allocated on one node, we get a segfault > > Testing environment: > openmpi v1.5 > 2 nodes with 4 8-cores sockets each > mpirun -n 10 -bind-to-core -npersocket 2 > > I was expecting to get: > . ranks 0-1 : node 0 - socket 0 > . ranks 2-3 : node 0 - socket 1 > . ranks 4-5 : node 0 - socket 2 > . ranks 6-7 : node 0 - socket 3 > . ranks 8-9 : node 1 - socket 0 > > Instead of that, everything worked fine on node 0, and I got a segfault > on node 1, with a stack that looks like: > > [derbeyn@berlin18 ~]$ mpirun --host berlin18,berlin26 -n 10 > -bind-to-core -npersocket 2 sleep 900 > [berlin26:21531] *** Process received signal *** > [berlin26:21531] Signal: Floating point exception (8) > [berlin26:21531] Signal code: Integer divide-by-zero (1) > [berlin26:21531] Failing at address: 0x7fed13731d63 > [berlin26:21531] [ 0] /lib64/libpthread.so.0(+0xf490) [0x7fed15327490] > [berlin26:21531] > [ 1] > /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/openmpi/mca_odls_default.so(+0x2d63) > [0x7fed13731d63] > [berlin26:21531] > [ 2] > /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_odls_base_default_launch_local+0xaf3) > [0x7fed15e1fe73] > [berlin26:21531] > [ 3] > /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/openmpi/mca_odls_default.so(+0x1d10) > [0x7fed13730d10] > [berlin26:21531] > [ 4] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(+0x3804d) > [0x7fed15e1004d] > [berlin26:21531] > [ 5] > /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_daemon_cmd_processor+0x4aa) > [0x7fed15e1209a] > [berlin26:21531] > [ 6] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(+0x74ee8) > [0x7fed15e4cee8] > [berlin26:21531] > [ 7] > /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_daemon+0x8d8) > [0x7fed15e0f268] > [berlin26:21531] [ 8] /home_nfs/derbeyn/DISTS/openmpi-v1.5/bin/orted() > [0x4008c6] > [berlin26:21531] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd) > [0x7fed14fa7c9d] > [berlin26:21531] [10] /home_nfs/derbeyn/DISTS/openmpi-v1.5/bin/orted() > [0x400799] > [berlin26:21531] *** End of error message *** > > The reason for this issue is that the npersocket value is taken into > account during the very first phase of mpirun (rmaps/load_balance) to > claim the slots on each node: > npersocket() (in rmaps/load_balance/rmaps_lb.c) claims > . 8 slots on node 0 (4 sockets * 2 persocket) > . 2 slots on node 1 (10 total ranks - 8 already claimed) > > But when we come to odls_default_fork_local_proc() (in > odls/default/odls_default_module.c) npersocket is actually recomputed. > Everything works fine on node 0. But on node 1, we have: > . jobdat->policy has both ORTE_BIND_TO_CORE and ORTE_MAPPING_NPERXXX > . npersocket is recomputed the following way: > npersocket = jobdat->num_local_procs/orte_odls_globals.num_sockets > = 2 / 4 = 0 > . later on, when the starting point is computed: > logical_cpu = (lrank % npersocket) * jobdat->cpus_per_rank; > we get the divide-by-zero exception. > > The problem comes, in my mind, from the fact we are recomputing the > npersocket on the local nodes instead of storing it in the jobdat > structure (as it is done today for the policy, the cpus_per_rank, the > stride,...). > Recomputing this value leads either to the segfault I got, or even to > wrong mappings: if we had had 4 slots claimed on node 1, the result > would have been 1 rank per socket (since we have 4-sockets nodes) > instead of 2 ranks on the first 2 sockets. > > The attached patch is a fix proposal implementing my suggestion of > storing the npersocket into the jobdat. > > This patch applies on v1.5. Waiting for your comments... > > Regards, > Nadia > > -- > Nadia Derbey > <001_dont_recompute_npersocket_on_local_nodes.patch>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel