The ft_event() function that you mentioned is part of the larger fault
tolerance infrastructure in Open MPI. You need to make sure to enable
it before using (if it is not enabled many of the ft_event functions
default to NULL). Add '--with-ft=cr' to your ./configure line and that
will enable the FT infrastructure.
As Jeff mentioned you might be able to use the Checkpoint/Restart
Coordination Protocol (CRCP) framework [located in ompi/mca/crcp] to
halt messaging. It works as a wrapper around the PML, so you are
operating on whole MPI messages, not fragments as in the BTLs below.
But it might be another option to consider.
-- Josh
On Jan 11, 2010, at 5:08 PM, Jeff Squyres wrote:
Additionally, I believe that the FT system already does something
like what you describe (although perhaps not exactly the same thing)
-- there is a phase where the FT system pauses and quiesces all BTLs.
Did you look at that part of the code, perchance, and see if it
meets your needs?
On Jan 11, 2010, at 3:53 PM, Christoph Konersmann wrote:
Thanks a lot for your help! I will give it a try.
Christoph
Ralph Castain schrieb:
You've got this a tad wrong, but that's okay - let me try to
clarify a couple of things that may help.
First, you don't want to add this as a separate orted command. As
you noted, orte has no direct way to tell the OMPI layer to do
anything. Instead, you want to pass a message to the process that
is received in the OMPI layer. That is easy to do.
1. add a message tag in ompi/mca/dpm/dpm.h - perhaps something
like OMPI_RML_TAG_BTL_CTL
2. in the btl, add a call to orte_rml.recv_nb() that identifies
the above tag and specifies a callback function to use when such a
message arrives
3. in that callback function, toggle your "paused" flag - or you
can unpack the buffer to get a flag telling you what value to set.
Your choice.
Now, when you want to pause the BTL, you do an
orte_grpcomm.xcast() to the above message tag. ORTE will deliver
that message to every process, which will then have its callback
function called.
HTH
Ralph
--
Paderborn Center for Parallel Computing - PC2
University of Paderborn - Germany
http://www.pc2.de
Christoph Konersmann <c...@upb.de>
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
jsquy...@cisco.com
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel