Hi Edson, The best way to achieve this is to write a tiny plugin, and in dmtcp_event_hook(), do the shutting down connection job at DMTCP_EVENT_THREADS_SUSPEND. Your application should expose an API to do this. Or, you can define a week symbol, and use dlsym() and RTLD_NEXT to find the symbol in your app.
Best, Jiajun On Wed, Oct 28, 2015 at 2:12 PM, Edson Tavares de Camargo < etcamarg...@gmail.com> wrote: > Hi, Jiajun > > > I guess what you want to do is to make sure the connection to the > outside world is shut down before DMTCP handles the network connection. > > Yes, that is exactly what I would like to do. > > > If that is the case, you should use the event > DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at event > DMTCP_EVENT_THREADS_RESUME. > > But my question now is how I can get the event DMTCP_EVENT_THREADS_SUSPEND > inside my application? > > I have managed to get the event DMTCP_EVENT_THREADS_SUSPEND inside a > DMTCP plugin, but I still can't manage to understand how to make the DMTCP > plugin send a message to my application. Or even to make the DMTCP pluging > call a function in my application (asking to shutdown the connection). > > It would be very nice if I could have the dmtcp_event_hook inside my > application. It is possible to do that? There is a another way to tell to > my application to shutdown the connection to the outside world at moment > before the checkpoint? > > Thanks! > > Edson > > > > 2015-10-27 20:45 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>: > >> Hi Edson, >> >> DMTCP_EVENT_WRITE_CKPT corresponds to the event right at the time of >> writing the checkpoint images into storage. At this point, the processing >> of network connection is already finished. I guess what you want to do is >> to make sure the connection to the outside world is shut down before DMTCP >> handles the network connection. If that is the case, you should use the >> event DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at >> event DMTCP_EVENT_THREADS_RESUME. >> >> Best, >> Jiajun >> >> On Tue, Oct 27, 2015 at 2:48 PM, Edson Tavares de Camargo < >> etcamarg...@gmail.com> wrote: >> >>> Hi Jiajun, >>> >>> Thank you for your answer. >>> >>> I use a TCP connection between the MPI application and a logger entity. >>> It is just to store some information about the MPI messages. >>> >>> I have realized that when the connection between the MPI application and >>> the Logger finishes, the DMTCP is able to make the checkpoint. It would be >>> possible to finish the connection at moment before the checkpoint and >>> restore it when the checkpoint has finished? >>> >>> I am trying to use the dmtcp_event_hook, but the event >>> DMTCP_EVENT_WRITE_CKPT seems to be called only after the following message >>> warning: >>> >>> [42000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval; >>> REASON='JWARNING(false) failed' >>> _dataSockets[i]->socket().sockfd() = 14 >>> buffer.size() = 0 >>> >>> WARN_INTERVAL_SEC = 10 >>> Message: Still draining socket... perhaps remote host is not running >>> under DMTCP? >>> >>> There is a way to capture the event before that message warning? >>> >>> Thanks a lot!!! >>> >>> Edson >>> On Oct 26, 2015 5:25 PM, "Jiajun Cao" <jia...@ccs.neu.edu> wrote: >>> >>>> Hi Edson, >>>> >>>> The error is what's expected. DMTCP considers the computation as a >>>> whole, i.e., for all processes involved in a computation, they must run >>>> under DMTCP. Technically, this is because DMTCP must handle the network >>>> communication. At the time of a checkpoint, DMTCP needs to drain the data >>>> in the sockets so that there won't be any lost data in-flight. In your >>>> case, the other side of the socket is not under the control of DMTCP. >>>> >>>> Also, if possible, could you tell us what kind of application are you >>>> running? I haven't tested DMTCP on MPI applications communicating with the >>>> external world. This can be a good test suite for us. >>>> >>>> >>>> Best, >>>> Jiajun >>>> >>>> On Mon, Oct 26, 2015 at 6:46 AM, Edson Tavares de Camargo < >>>> etcamarg...@gmail.com> wrote: >>>> >>>>> Hi Everyone! >>>>> >>>>> >>>>> I have a question: What is the expected behaviour of DMTCP when I use >>>>> DMTCP on a MPI application that exchanges messages with another >>>>> application >>>>> that is not running on dmtcp_launch? >>>>> >>>>> I ask because I have an error when I execute a MPI application that >>>>> exchanges message via TCP with another application. Both application are >>>>> running on my cluster. But I only need to make the checkpoint the MPI >>>>> application. The error is the following: >>>>> >>>>> ======== >>>>> WARNING at kernelbufferdrainer.cpp:120 in onTimeoutInterval; >>>>> REASON='JWARNING(false) failed' >>>>> _dataSockets[i]->socket().sockfd() = 15 >>>>> buffer.size() = 1059 >>>>> WARN_INTERVAL_SEC = 10 >>>>> Message: Still draining socket... perhaps remote host is not running >>>>> under DMTCP? >>>>> ======= >>>>> >>>>> Thanks! >>>>> >>>>> Edson >>>>> ------- >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Dmtcp-forum mailing list >>>>> Dmtcp-forum@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>>>> >>>>> >>>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Dmtcp-forum mailing list >>> Dmtcp-forum@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>> >>> >> >
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum