So the ompi-checkpoint command connects with the Global Coordinator in
the SnapC 'full' component. The Global Coordinator lives in the HNP
(mpirun/orterun) as determined by the 'full' component. As a result to
start a checkpoint ompi-checkpoint must connect to the HNP.
From a user standpoint, they are typically running ompi-checkpoint
from the same machine where they started mpirun. So it made the most
sense to have these two connect to each other, especially if we ask
the user to provide the PID of the mpirun process to checkpoint.
That being said, with the proper changes to 'full' (or with a new
SnapC component), ompi-checkpoint could issue the checkpoint request
to any process in the MPI job [orterun, orted, application processes]
and have the correct things happen.
I have received one request for this functionality, but have not had
the time yet to dig into it.
Does that help?
Cheers,
Josh
On Jan 31, 2008, at 9:51 AM, Leonardo Fialho wrote:
Hi all (and Josh),
Why the ompi-checkpoint have to contact the HNP specifically? If I use
another process to start the snapshot coordinator, apparently it´s
works fine, no?
PS: I prefer to send this message to the list... to keep it on the
history for further use...
--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel