Hello ! I am working on some simulations where I have to perform periodic kill-restart and checkpointing on a MPI application.
As a checkpoint can take place immediately after restart I need some way to know whether ompi-restart of the application is complete. If I do not ensure that restart of all application processes is complete, ompi-checkpoint fails after throwing a slew of errors. Can someone please suggest an idea for having some kind of notification indicating restarts have complete (in the sense that checkpointing . Thank you, Kishor