Ralph,

I got consistent segfaults during the infrastructure tearing down in the
orterun (I noticed them on a OSX). After digging a little bit it turns out
that the opal_buffet_t class has been cleaned-up in orte_finalize before
orte_proc_info_finalize is called, leading to calling the destructors into
a randomly initialized memory. If I change the order of the teardown to
move orte_proc_info_finalize before orte_finalize things work better, but I
still get a very annoying warning about a "Bad file descriptor in select".

Any better fix ?

George.

PS: Here is the patch I am currently using to get rid of the segfaults

diff --git a/orte/tools/orterun/orterun.c b/orte/tools/orterun/orterun.c
index 85aba0a0f3..506b931d35 100644
--- a/orte/tools/orterun/orterun.c
+++ b/orte/tools/orterun/orterun.c
@@ -222,10 +222,10 @@ int orterun(int argc, char *argv[])
  DONE:
     /* cleanup and leave */
     orte_submit_finalize();
-    orte_finalize();
-    orte_session_dir_cleanup(ORTE_JOBID_WILDCARD);
     /* cleanup the process info */
     orte_proc_info_finalize();
+    orte_finalize();
+    orte_session_dir_cleanup(ORTE_JOBID_WILDCARD);

     if (orte_debug_flag) {
         fprintf(stderr, "exiting with status %d\n", orte_exit_status);
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to