You should never access a pointer array's data area that way (i.e., by index 
against the raw data). You really should do:

if (NULL == (proc = (orte_proc_t*)opal_pointer_array_get_item(jdata->procs, 
vpid))) {
      /* error report */
}

to protect against changes. The errmgr generally doesn't remove a process 
object upon failure - it just sets its state to some appropriate value. 
However, depending upon where you are trying to do this, and the history that 
got you down this code path, it is possible.

Also, remember that if you are in a daemon, then the jdata objects are not 
populated. The daemons work exclusively from the orte_local_jobdata and 
orte_local_children lists, so you would have to find your process there.

We might change that someday, but my first attempt at doing so ran into a 
snarled mess.

On Mar 21, 2011, at 12:40 PM, Hugo Meyer wrote:

> Hello @ll.
> 
> I'm having a problem when i try to access to data->procs->addr[vpid] when the 
> vpid belong to a recently killed process. I'm sending here a piece of my 
> code. The problem is that the execution is always entering in the last if 
> clause maybe because the information of the dead process is no longer 
> available, or maybe i'm doing something wrong when accessing.
> 
> Any help will be apreciated.
> 
>                                         command = 
> ORTE_DAEMON_REPORT_JOB_INFO_CMD;
>                                         buffer = OBJ_NEW(opal_buffer_t);
>                                         if (ORTE_SUCCESS != (rc = 
> opal_dss.pack(buffer, &command, 1, ORTE_DAEMON_CMD))) {
>                                             ORTE_ERROR_LOG(rc);
>                                             OBJ_RELEASE(buffer);
>                                             return rc;
>                                         }
>                                         if (ORTE_SUCCESS != (rc = 
> opal_dss.pack(buffer, &proc->jobid, 1, ORTE_JOBID))) {
>                                             ORTE_ERROR_LOG(rc);
>                                             OBJ_RELEASE(buffer);
>                                             return rc;
>                                         }
>                                         /* do the send */
>                                         if (0 > (rc = 
> orte_rml.send_buffer(ORTE_PROC_MY_HNP, buffer, ORTE_RML_TAG_DAEMON, 0))) {
>                                             ORTE_ERROR_LOG(rc);
>                                             OBJ_RELEASE(buffer);
>                                             return rc;
>                                         }
>                                         OBJ_RELEASE(buffer);
>                                         buffer = OBJ_NEW(opal_buffer_t);
> 
>                                         
> orte_rml.recv_buffer(ORTE_NAME_WILDCARD, buffer, ORTE_RML_TAG_TOOL, 0);
>                                         
>                                         opal_dss.unpack(buffer, &response, 
> &n, OPAL_INT32);
> 
>                                         if(response==0){
>                                             OPAL_OUTPUT_VERBOSE((5, 
> orte_errmgr_base.output,"NO ESCRIBÍ AL HNP\n "));
>                                         }else{
>                                             opal_dss.unpack(buffer, &jdata, 
> &n, ORTE_JOB);
>                                         }
> 
>                                         procs = 
> (orte_proc_t**)jdata->procs->addr;
>                                         if(procs==NULL){
>                                             OPAL_OUTPUT_VERBOSE((5, 
> orte_errmgr_base.output, "grave: procs==null"));
>                                         }
> 
>                                         command = 
> ORTE_DAEMON_UPDATE_STATE_CMD;
> 
>                                         OBJ_RELEASE(buffer);
>                                         buffer = OBJ_NEW(opal_buffer_t);
>                                         
>                                         if (ORTE_SUCCESS != (rc = 
> opal_dss.pack(buffer, &command, 1, ORTE_DAEMON_CMD))) {
>                                             ORTE_ERROR_LOG(rc);
>                                             OBJ_RELEASE(buffer);
>                                             goto CLEANUP;
>                                         }
> 
>                                         orte_proc_state_t state = 
> ORTE_PROC_STATE_FAULT;
>                                         /* Pack the faulty vpid */
>                                         if (ORTE_SUCCESS != (rc = 
> opal_dss.pack(buffer, &proc, 1, ORTE_NAME))) {
>                                             ORTE_ERROR_LOG(rc);
>                                             goto CLEANUP;
>                                         }
> 
>                                         /* Pack the state */
>                                         if (ORTE_SUCCESS != (rc = 
> opal_dss.pack(buffer, &state, 1, OPAL_UINT16))) {
>                                             ORTE_ERROR_LOG(rc);
>                                             goto CLEANUP;
>                                         }
> 
>                                         if (NULL == procs[proc->vpid] || NULL 
> == procs[proc->vpid]->node) {
>                                             OPAL_OUTPUT_VERBOSE((5, 
> orte_errmgr_base.output, "PROBLEM: procs[proc.vpid]==null"));
>                                         }
> 
> Thanks a lot.
> 
> Hugo Meyer
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to