Forget it. I found the problem... a little patch to orte_dt_pack/unpack_fns solve my problem...

Leonardo

Leonardo Fialho escribió:
Hi All,

I have a little doubt about how to update the orte_proc structure.

I have modified the orte_proc structure to include another field (orte_name_proc_t type) to describe the node whose store my checkpoints and logs:

struct orte_proc_t {
...
#if OPAL_ENABLE_FT_RADIC == 1
   /* protector node */
   orte_process_name_t protector;
#endif
};

Thus, I have added in orted_comm.c a code which I think that would update de job structure:
/* Update the structure */
if (NULL == (jdata = orte_get_job_data_object(sender_jobid))) {
   ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
  goto CLEANUP;
}
procs = (orte_proc_t**)jdata->procs->addr;
if (NULL == procs[sender_vpid] ) {
   ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
   goto CLEANUP;
}
procs[sender_vpid]->protector.jobid = protector_jobid;
procs[sender_vpid]->protector.vpid  = protector_vpid;
opal_output(0, "%s is the protector of %s", ORTE_NAME_PRINT(&procs[sender_vpid]->name), ORTE_NAME_PRINT(&procs[sender_vpid]->protector));

In the log of the orte daemon which acts as HNP I can see correct informations which was added to the orte_proc structure, but, when I use my modified version of orte-ps I found incorrect information ([[INVALID],INVALID]). Bellow is the code I have used in orte-ps:

#if OPAL_ENABLE_FT_RADIC == 1
       protector = orte_util_print_name_args(&vpid->protector);
       printf("%*s |",   len_protector, protector);
#endif

The question is: why the HNP show the correct information, and the orte-ps don´t?

Thanks


--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478

Reply via email to