Folks,

let's look at the following trivial test program :

#include <mpi.h>
#include <stdio.h>

int main (int argc, char * argv[]) {
    int rank, size;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    printf ("I am %d/%d and i abort\n", rank, size);
    MPI_Abort(MPI_COMM_WORLD, 2);
    printf ("%d/%d aborted !\n", rank, size);
    return 3;
}

and let's run mpirun (trunk) on node0 and ask the mpi task to run on
task 1 :
with two tasks or more :

node0 $ mpirun --mca btl tcp,self -host node1 -np 2 ./abort
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
I am 1/2 and i abort
I am 0/2 and i abort
[node0:00740] 1 more process has sent help message help-mpi-api.txt /
mpi-abort
[node0:00740] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages

node0 $ echo $?
0

the exit status of mpirun is zero
/* this is why the MPI_Errhandler_fatal_c test fails in mtt */

now if we run only one task :

node0 $ mpirun --mca btl tcp,self -host node1 -np 1 ./abort
I am 0/1 and i abort
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 15884 on
node node1 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

--------------------------------------------------------------------------
node0 $ echo $?
1

the program displayed a misleading error message and mpirun exited with
error code 1
/* i would have expected 2, or 3 in the worst case scenario */


i digged it a bit and found a kind of race condition in orted (running
on node 1)
basically, when the process dies, it writes stuff in the openmpi session
directory and exits.
exiting send a SIGCHLD to orted and close the socket/pipe connected to
orted.
on orted, the loss of connection is generally processed before the
SIGCHLD by libevent,
and as a consequence, the exit code is not correctly set (e.g. it is
left to zero).
i did not see any kind of communication between the mpi task and orted
(except writing a file in the openmpi session directory) as i would have
expected
/* but this was just my initial guess, the truth is i do not know what
is supposed to happen */

i wrote the attached abort.patch patch to basically get it working.
i highly suspect this is not the right thing to do so i did not commit it.

it works fine with two tasks or more.
with only one task, mpirun display a misleading error message but the
exit status is ok.

could someone (Ralph ?) have a look at this ?

Cheers,

Gilles


node0 $ mpirun --mca btl tcp,self -host node1 -np 2 ./abort
I am 1/2 and i abort
I am 0/2 and i abort
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[node0:00920] 1 more process has sent help message help-mpi-api.txt /
mpi-abort
[node0:00920] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages
node0 $ echo $?
2



node0 $ mpirun --mca btl tcp,self -host node1 -np 1 ./abort
I am 0/1 and i abort
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[7955,1],0]
  Exit code:    2
--------------------------------------------------------------------------
node0 $ echo $?
2



Index: orte/mca/odls/base/odls_base_default_fns.c
===================================================================
--- orte/mca/odls/base/odls_base_default_fns.c  (revision 32554)
+++ orte/mca/odls/base/odls_base_default_fns.c  (working copy)
@@ -15,6 +15,8 @@
  *                         All rights reserved.
  * Copyright (c) 2011-2013 Cisco Systems, Inc.  All rights reserved.
  * Copyright (c) 2013-2014 Intel, Inc.  All rights reserved.
+ * Copyright (c) 2014      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -1772,6 +1774,20 @@
                         ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
                         ORTE_NAME_PRINT(&proc->name), (long)proc->pid);

+#if 1
+    /* the child might have died but the SIGCHLD event has not
+     * been processed by libevent yet. try to waitpid in order
+     * to set proc->exit_code */
+    if (ORTE_FLAG_TEST(proc, ORTE_PROC_FLAG_ALIVE)) {
+        int status;
+        /* FIXME : loop as long as errno == EINTR */
+        if (waitpid(proc->pid, &status, WNOHANG) == proc->pid) {
+            int state = ORTE_PROC_STATE_WAITPID_FIRED;
+            proc->exit_code = status;
+            ORTE_ACTIVATE_PROC_STATE(&proc->name, state);
+        }
+    }
+#endif
     /* if the child was previously flagged as dead, then just
      * update its exit status and
      * ensure that its exit state gets reported to avoid hanging

Reply via email to