Folks, let's look at the following trivial test program :
#include <mpi.h> #include <stdio.h> int main (int argc, char * argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf ("I am %d/%d and i abort\n", rank, size); MPI_Abort(MPI_COMM_WORLD, 2); printf ("%d/%d aborted !\n", rank, size); return 3; } and let's run mpirun (trunk) on node0 and ask the mpi task to run on task 1 : with two tasks or more : node0 $ mpirun --mca btl tcp,self -host node1 -np 2 ./abort -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 2. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- I am 1/2 and i abort I am 0/2 and i abort [node0:00740] 1 more process has sent help message help-mpi-api.txt / mpi-abort [node0:00740] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages node0 $ echo $? 0 the exit status of mpirun is zero /* this is why the MPI_Errhandler_fatal_c test fails in mtt */ now if we run only one task : node0 $ mpirun --mca btl tcp,self -host node1 -np 1 ./abort I am 0/1 and i abort -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 2. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 0 with PID 15884 on node node1 exiting improperly. There are three reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter orte_create_session_dirs is set to false. In this case, the run-time cannot detect that the abort call was an abnormal termination. Hence, the only error message you will receive is this one. This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). You can avoid this message by specifying -quiet on the mpirun command line. -------------------------------------------------------------------------- node0 $ echo $? 1 the program displayed a misleading error message and mpirun exited with error code 1 /* i would have expected 2, or 3 in the worst case scenario */ i digged it a bit and found a kind of race condition in orted (running on node 1) basically, when the process dies, it writes stuff in the openmpi session directory and exits. exiting send a SIGCHLD to orted and close the socket/pipe connected to orted. on orted, the loss of connection is generally processed before the SIGCHLD by libevent, and as a consequence, the exit code is not correctly set (e.g. it is left to zero). i did not see any kind of communication between the mpi task and orted (except writing a file in the openmpi session directory) as i would have expected /* but this was just my initial guess, the truth is i do not know what is supposed to happen */ i wrote the attached abort.patch patch to basically get it working. i highly suspect this is not the right thing to do so i did not commit it. it works fine with two tasks or more. with only one task, mpirun display a misleading error message but the exit status is ok. could someone (Ralph ?) have a look at this ? Cheers, Gilles node0 $ mpirun --mca btl tcp,self -host node1 -np 2 ./abort I am 1/2 and i abort I am 0/2 and i abort -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 2. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- [node0:00920] 1 more process has sent help message help-mpi-api.txt / mpi-abort [node0:00920] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages node0 $ echo $? 2 node0 $ mpirun --mca btl tcp,self -host node1 -np 1 ./abort I am 0/1 and i abort -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 2. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- ------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted. ------------------------------------------------------- -------------------------------------------------------------------------- mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[7955,1],0] Exit code: 2 -------------------------------------------------------------------------- node0 $ echo $? 2
Index: orte/mca/odls/base/odls_base_default_fns.c =================================================================== --- orte/mca/odls/base/odls_base_default_fns.c (revision 32554) +++ orte/mca/odls/base/odls_base_default_fns.c (working copy) @@ -15,6 +15,8 @@ * All rights reserved. * Copyright (c) 2011-2013 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2013-2014 Intel, Inc. All rights reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -1772,6 +1774,20 @@ ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), ORTE_NAME_PRINT(&proc->name), (long)proc->pid); +#if 1 + /* the child might have died but the SIGCHLD event has not + * been processed by libevent yet. try to waitpid in order + * to set proc->exit_code */ + if (ORTE_FLAG_TEST(proc, ORTE_PROC_FLAG_ALIVE)) { + int status; + /* FIXME : loop as long as errno == EINTR */ + if (waitpid(proc->pid, &status, WNOHANG) == proc->pid) { + int state = ORTE_PROC_STATE_WAITPID_FIRED; + proc->exit_code = status; + ORTE_ACTIVATE_PROC_STATE(&proc->name, state); + } + } +#endif /* if the child was previously flagged as dead, then just * update its exit status and * ensure that its exit state gets reported to avoid hanging