Interesting - I see why. Please try this version. Ralph
On Thu, Oct 15, 2015 at 4:05 AM, Mark Santcroos <mark.santcr...@rutgers.edu> wrote: > > > On 15 Oct 2015, at 4:38 , Ralph Castain <r...@open-mpi.org> wrote: > > Okay, please try the attached patch. > > *scratch* > > Although I reported results with the patch earlier, I can't reproduce it > anymore. > Now orte-dvm shuts down after the first orte-submit completes with: > > > [netbook:72038] [[9827,0],0] orted:comm:process_commands() Processing > Command: ORTE_DAEMON_SPAWN_JOB_CMD > [netbook:72038] [[9827,0],0] orted:comm:process_commands() Processing > Command: ORTE_DAEMON_ADD_LOCAL_PROCS > [netbook:72038] [[9827,0],0] Releasing job data for [INVALID] > [netbook:72038] sess_dir_finalize: proc session dir does not exist > [netbook:72038] [[9827,0],0] JOB [9827,1] HAS TERMINATED > [netbook:72038] [[9827,0],0] NOTIFYING [[9826,0],0] OF JOB [9827,1] > COMPLETION > [netbook:72038] [[9827,0],0] JOB [9827,1] HAS TERMINATED > [netbook:72038] [[9827,0],0] orted:comm:process_commands() Processing > Command: ORTE_DAEMON_EXIT_CMD > [netbook:72038] sess_dir_finalize: proc session dir does not exist > [netbook:72038] sess_dir_cleanup: job session dir does not exist > exiting with status 0 > > > (Earlier I maybe had an unpatched instance of orte-dvm still running and > either the installation or some dynamic linking got messed up?!?!) > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18178.php >
diff --git a/orte/mca/state/dvm/state_dvm.c b/orte/mca/state/dvm/state_dvm.c index 0e7309c..5b1a841 100644 --- a/orte/mca/state/dvm/state_dvm.c +++ b/orte/mca/state/dvm/state_dvm.c @@ -267,6 +267,7 @@ void check_complete(int fd, short args, void *cbdata) if (jdata->state < ORTE_JOB_STATE_UNTERMINATED) { jdata->state = ORTE_JOB_STATE_TERMINATED; } + opal_output(0, "%s JOB %s HAS TERMINATED", ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), ORTE_JOBID_PRINT(jdata->jobid)); } /* tell the IOF that the job is complete */ diff --git a/orte/tools/orte-dvm/orte-dvm.c b/orte/tools/orte-dvm/orte-dvm.c index 3cdf585..f9a969a 100644 --- a/orte/tools/orte-dvm/orte-dvm.c +++ b/orte/tools/orte-dvm/orte-dvm.c @@ -462,6 +462,11 @@ static void notify_requestor(int sd, short args, void *cbdata) int ret; opal_buffer_t *reply; +opal_output(0, "%s NOTIFYING %s OF JOB %s COMPLETION", + ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), + ORTE_NAME_PRINT(&jdata->originator), + ORTE_JOBID_PRINT(jdata->jobid)); + /* notify the requestor */ reply = OBJ_NEW(opal_buffer_t); /* see if there was any problem */ @@ -476,6 +481,7 @@ static void notify_requestor(int sd, short args, void *cbdata) /* we cannot cleanup the job object as we might * hit an error during transmission, so clean it * up in the send callback */ + OBJ_RELEASE(caddy); }