Hello @ll.

I'm trying to restart a child that has failed, now i'm catching the failed
child in the errmgr and then i'm packing the child and sending it to another
node who has to "adopt" it. Is there any way to do this with te actual
implementation? something like add_child. Because the i will have to do
somethin like:

opal_list_item_t *item;
orte_odls_job_t *jobdat;
orte_app_context_t *app;
for (item = opal_list_get_first(&orte_local_jobdata);
         item != opal_list_get_end(&orte_local_jobdata);
         item = opal_list_get_next(item)) {
        jobdat = (orte_odls_job_t*)item;
        if (jobdat->jobid == child->name->jobid) {
            break;
        }
    }
app = jobdat->apps[child->app_idx];

In order to do this, i need to have the child in the jobdat. If there is not
such thing implemented, could someone give me an advice on how to do this.

Best Regards.

Hugo Meyer

Reply via email to