Hi, Jiajun > I guess what you want to do is to make sure the connection to the outside world is shut down before DMTCP handles the network connection.
Yes, that is exactly what I would like to do. > If that is the case, you should use the event DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at event DMTCP_EVENT_THREADS_RESUME. But my question now is how I can get the event DMTCP_EVENT_THREADS_SUSPEND inside my application? I have managed to get the event DMTCP_EVENT_THREADS_SUSPEND inside a DMTCP plugin, but I still can't manage to understand how to make the DMTCP plugin send a message to my application. Or even to make the DMTCP pluging call a function in my application (asking to shutdown the connection). It would be very nice if I could have the dmtcp_event_hook inside my application. It is possible to do that? There is a another way to tell to my application to shutdown the connection to the outside world at moment before the checkpoint? Thanks! Edson 2015-10-27 20:45 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>: > Hi Edson, > > DMTCP_EVENT_WRITE_CKPT corresponds to the event right at the time of > writing the checkpoint images into storage. At this point, the processing > of network connection is already finished. I guess what you want to do is > to make sure the connection to the outside world is shut down before DMTCP > handles the network connection. If that is the case, you should use the > event DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at > event DMTCP_EVENT_THREADS_RESUME. > > Best, > Jiajun > > On Tue, Oct 27, 2015 at 2:48 PM, Edson Tavares de Camargo < > etcamarg...@gmail.com> wrote: > >> Hi Jiajun, >> >> Thank you for your answer. >> >> I use a TCP connection between the MPI application and a logger entity. >> It is just to store some information about the MPI messages. >> >> I have realized that when the connection between the MPI application and >> the Logger finishes, the DMTCP is able to make the checkpoint. It would be >> possible to finish the connection at moment before the checkpoint and >> restore it when the checkpoint has finished? >> >> I am trying to use the dmtcp_event_hook, but the event >> DMTCP_EVENT_WRITE_CKPT seems to be called only after the following message >> warning: >> >> [42000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval; >> REASON='JWARNING(false) failed' >> _dataSockets[i]->socket().sockfd() = 14 >> buffer.size() = 0 >> >> WARN_INTERVAL_SEC = 10 >> Message: Still draining socket... perhaps remote host is not running >> under DMTCP? >> >> There is a way to capture the event before that message warning? >> >> Thanks a lot!!! >> >> Edson >> On Oct 26, 2015 5:25 PM, "Jiajun Cao" <jia...@ccs.neu.edu> wrote: >> >>> Hi Edson, >>> >>> The error is what's expected. DMTCP considers the computation as a >>> whole, i.e., for all processes involved in a computation, they must run >>> under DMTCP. Technically, this is because DMTCP must handle the network >>> communication. At the time of a checkpoint, DMTCP needs to drain the data >>> in the sockets so that there won't be any lost data in-flight. In your >>> case, the other side of the socket is not under the control of DMTCP. >>> >>> Also, if possible, could you tell us what kind of application are you >>> running? I haven't tested DMTCP on MPI applications communicating with the >>> external world. This can be a good test suite for us. >>> >>> >>> Best, >>> Jiajun >>> >>> On Mon, Oct 26, 2015 at 6:46 AM, Edson Tavares de Camargo < >>> etcamarg...@gmail.com> wrote: >>> >>>> Hi Everyone! >>>> >>>> >>>> I have a question: What is the expected behaviour of DMTCP when I use >>>> DMTCP on a MPI application that exchanges messages with another application >>>> that is not running on dmtcp_launch? >>>> >>>> I ask because I have an error when I execute a MPI application that >>>> exchanges message via TCP with another application. Both application are >>>> running on my cluster. But I only need to make the checkpoint the MPI >>>> application. The error is the following: >>>> >>>> ======== >>>> WARNING at kernelbufferdrainer.cpp:120 in onTimeoutInterval; >>>> REASON='JWARNING(false) failed' >>>> _dataSockets[i]->socket().sockfd() = 15 >>>> buffer.size() = 1059 >>>> WARN_INTERVAL_SEC = 10 >>>> Message: Still draining socket... perhaps remote host is not running >>>> under DMTCP? >>>> ======= >>>> >>>> Thanks! >>>> >>>> Edson >>>> ------- >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Dmtcp-forum mailing list >>>> Dmtcp-forum@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>>> >>>> >>> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Dmtcp-forum mailing list >> Dmtcp-forum@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >> >> >
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum