Hello @ll. I'm trying to restart a child that has failed, now i'm catching the failed child in the errmgr and then i'm packing the child and sending it to another node who has to "adopt" it. Is there any way to do this with te actual implementation? something like add_child. Because the i will have to do somethin like:
opal_list_item_t *item; orte_odls_job_t *jobdat; orte_app_context_t *app; for (item = opal_list_get_first(&orte_local_jobdata); item != opal_list_get_end(&orte_local_jobdata); item = opal_list_get_next(item)) { jobdat = (orte_odls_job_t*)item; if (jobdat->jobid == child->name->jobid) { break; } } app = jobdat->apps[child->app_idx]; In order to do this, i need to have the child in the jobdat. If there is not such thing implemented, could someone give me an advice on how to do this. Best Regards. Hugo Meyer