Thanks Ralph for your reply. 2011/3/21 Ralph Castain <r...@open-mpi.org>
> You should never access a pointer array's data area that way (i.e., by > index against the raw data). You really should do: > > if (NULL == (proc = (orte_proc_t*)opal_pointer_array_get_item(jdata->procs, > vpid))) { > /* error report */ > } > > About this, i've changed this in my code but i'm getting the same result. Null when asking about a dead process. > The errmgr generally doesn't remove a process object upon failure - it just > sets its state to some appropriate value. However, depending upon where you > are trying to do this, and the history that got you down this code path, it > is possible. > I'm writing this code into the errmgr_orted.c, and it is executed when a process fails. > > Also, remember that if you are in a daemon, then the jdata objects are not > populated. The daemons work exclusively from the orte_local_jobdata and > orte_local_children lists, so you would have to find your process there. > That's why i'm asking to the hnp about the jdata using * ORTE_DAEMON_REPORT_JOB_INFO_CMD*, i assume that he has the information about the dead process. Any idea? Best regards. Hugo Meyer