Hi All,

I´m trying to restart a process from a previous checkpoint. My (modified) orted is trying to do this. Its uses the opal-restart command, but after cr_restart is called by CRS (crs:blcr: blcr_restart: SELF: exec :(cr_restart, cr_restart /tmp/radic//1/ompi_blcr_context.6507)) the SO freezes (kernel panic). The error generated at this moment is:

"Restart failed: No such device or address"

I think that it can be generated because the stdin/stdout/stderr from the checkpointed file points to undefined descriptor os something like this...

Anybody can help about this? How can I close these descriptor before the checkpoint? The opal-restart open these descriptor too? What can I make to it works?

Thanks,
--

Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478

Reply via email to