Ralph, I got consistent segfaults during the infrastructure tearing down in the orterun (I noticed them on a OSX). After digging a little bit it turns out that the opal_buffet_t class has been cleaned-up in orte_finalize before orte_proc_info_finalize is called, leading to calling the destructors into a randomly initialized memory. If I change the order of the teardown to move orte_proc_info_finalize before orte_finalize things work better, but I still get a very annoying warning about a "Bad file descriptor in select".
Any better fix ? George. PS: Here is the patch I am currently using to get rid of the segfaults diff --git a/orte/tools/orterun/orterun.c b/orte/tools/orterun/orterun.c index 85aba0a0f3..506b931d35 100644 --- a/orte/tools/orterun/orterun.c +++ b/orte/tools/orterun/orterun.c @@ -222,10 +222,10 @@ int orterun(int argc, char *argv[]) DONE: /* cleanup and leave */ orte_submit_finalize(); - orte_finalize(); - orte_session_dir_cleanup(ORTE_JOBID_WILDCARD); /* cleanup the process info */ orte_proc_info_finalize(); + orte_finalize(); + orte_session_dir_cleanup(ORTE_JOBID_WILDCARD); if (orte_debug_flag) { fprintf(stderr, "exiting with status %d\n", orte_exit_status);
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel