Forget it. I found the problem... a little patch to
orte_dt_pack/unpack_fns solve my problem...
Leonardo
Leonardo Fialho escribió:
Hi All,
I have a little doubt about how to update the orte_proc structure.
I have modified the orte_proc structure to include another field
(orte_name_proc_t type) to describe the node whose store my
checkpoints and logs:
struct orte_proc_t {
...
#if OPAL_ENABLE_FT_RADIC == 1
/* protector node */
orte_process_name_t protector;
#endif
};
Thus, I have added in orted_comm.c a code which I think that would
update de job structure:
/* Update the structure */
if (NULL == (jdata = orte_get_job_data_object(sender_jobid))) {
ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
goto CLEANUP;
}
procs = (orte_proc_t**)jdata->procs->addr;
if (NULL == procs[sender_vpid] ) {
ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
goto CLEANUP;
}
procs[sender_vpid]->protector.jobid = protector_jobid;
procs[sender_vpid]->protector.vpid = protector_vpid;
opal_output(0, "%s is the protector of %s",
ORTE_NAME_PRINT(&procs[sender_vpid]->name),
ORTE_NAME_PRINT(&procs[sender_vpid]->protector));
In the log of the orte daemon which acts as HNP I can see correct
informations which was added to the orte_proc structure, but, when I
use my modified version of orte-ps I found incorrect information
([[INVALID],INVALID]). Bellow is the code I have used in orte-ps:
#if OPAL_ENABLE_FT_RADIC == 1
protector = orte_util_print_name_args(&vpid->protector);
printf("%*s |", len_protector, protector);
#endif
The question is: why the HNP show the correct information, and the
orte-ps don´t?
Thanks
--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478