Hi Everyone Actually my problem was solved after I've built a simple plugin and I've run it with dmtcp_lauch. The DMTCP has managed the TCP connection of my MPI application with my external application (that is inside my cluster) only if the plugin is run together the dmtcp_lauch. For example:
- dmtcp_launch mpirun ... It doesn't work. The dmtcp doesn't managed to drain the buffers. - dmtcp_launch --with-plugin plugin.so mpirun ... It works fine! Could you explain me why it works with the plugin and doesn't work without the plugin? Thanks! Edson 2015-10-29 2:44 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>: > Hi Edson, > > The best way to achieve this is to write a tiny plugin, and in > dmtcp_event_hook(), do the shutting down connection job at > DMTCP_EVENT_THREADS_SUSPEND. > Your application should expose an API to do this. Or, you can define a week > symbol, and use dlsym() and RTLD_NEXT to find the symbol in your app. > > Best, > Jiajun > > On Wed, Oct 28, 2015 at 2:12 PM, Edson Tavares de Camargo < > etcamarg...@gmail.com> wrote: > >> Hi, Jiajun >> >> > I guess what you want to do is to make sure the connection to the >> outside world is shut down before DMTCP handles the network connection. >> >> Yes, that is exactly what I would like to do. >> >> > If that is the case, you should use the event >> DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at event >> DMTCP_EVENT_THREADS_RESUME. >> >> But my question now is how I can get the event >> DMTCP_EVENT_THREADS_SUSPEND inside my application? >> >> I have managed to get the event DMTCP_EVENT_THREADS_SUSPEND inside a >> DMTCP plugin, but I still can't manage to understand how to make the DMTCP >> plugin send a message to my application. Or even to make the DMTCP pluging >> call a function in my application (asking to shutdown the connection). >> >> It would be very nice if I could have the dmtcp_event_hook inside my >> application. It is possible to do that? There is a another way to tell to >> my application to shutdown the connection to the outside world at moment >> before the checkpoint? >> >> Thanks! >> >> Edson >> >> >> >> 2015-10-27 20:45 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>: >> >>> Hi Edson, >>> >>> DMTCP_EVENT_WRITE_CKPT corresponds to the event right at the time of >>> writing the checkpoint images into storage. At this point, the processing >>> of network connection is already finished. I guess what you want to do is >>> to make sure the connection to the outside world is shut down before DMTCP >>> handles the network connection. If that is the case, you should use the >>> event DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at >>> event DMTCP_EVENT_THREADS_RESUME. >>> >>> Best, >>> Jiajun >>> >>> On Tue, Oct 27, 2015 at 2:48 PM, Edson Tavares de Camargo < >>> etcamarg...@gmail.com> wrote: >>> >>>> Hi Jiajun, >>>> >>>> Thank you for your answer. >>>> >>>> I use a TCP connection between the MPI application and a logger entity. >>>> It is just to store some information about the MPI messages. >>>> >>>> I have realized that when the connection between the MPI application >>>> and the Logger finishes, the DMTCP is able to make the checkpoint. It >>>> would be possible to finish the connection at moment before the checkpoint >>>> and restore it when the checkpoint has finished? >>>> >>>> I am trying to use the dmtcp_event_hook, but the event >>>> DMTCP_EVENT_WRITE_CKPT seems to be called only after the following message >>>> warning: >>>> >>>> [42000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval; >>>> REASON='JWARNING(false) failed' >>>> _dataSockets[i]->socket().sockfd() = 14 >>>> buffer.size() = 0 >>>> >>>> WARN_INTERVAL_SEC = 10 >>>> Message: Still draining socket... perhaps remote host is not running >>>> under DMTCP? >>>> >>>> There is a way to capture the event before that message warning? >>>> >>>> Thanks a lot!!! >>>> >>>> Edson >>>> On Oct 26, 2015 5:25 PM, "Jiajun Cao" <jia...@ccs.neu.edu> wrote: >>>> >>>>> Hi Edson, >>>>> >>>>> The error is what's expected. DMTCP considers the computation as a >>>>> whole, i.e., for all processes involved in a computation, they must run >>>>> under DMTCP. Technically, this is because DMTCP must handle the network >>>>> communication. At the time of a checkpoint, DMTCP needs to drain the data >>>>> in the sockets so that there won't be any lost data in-flight. In your >>>>> case, the other side of the socket is not under the control of DMTCP. >>>>> >>>>> Also, if possible, could you tell us what kind of application are you >>>>> running? I haven't tested DMTCP on MPI applications communicating with the >>>>> external world. This can be a good test suite for us. >>>>> >>>>> >>>>> Best, >>>>> Jiajun >>>>> >>>>> On Mon, Oct 26, 2015 at 6:46 AM, Edson Tavares de Camargo < >>>>> etcamarg...@gmail.com> wrote: >>>>> >>>>>> Hi Everyone! >>>>>> >>>>>> >>>>>> I have a question: What is the expected behaviour of DMTCP when I use >>>>>> DMTCP on a MPI application that exchanges messages with another >>>>>> application >>>>>> that is not running on dmtcp_launch? >>>>>> >>>>>> I ask because I have an error when I execute a MPI application that >>>>>> exchanges message via TCP with another application. Both application are >>>>>> running on my cluster. But I only need to make the checkpoint the MPI >>>>>> application. The error is the following: >>>>>> >>>>>> ======== >>>>>> WARNING at kernelbufferdrainer.cpp:120 in onTimeoutInterval; >>>>>> REASON='JWARNING(false) failed' >>>>>> _dataSockets[i]->socket().sockfd() = 15 >>>>>> buffer.size() = 1059 >>>>>> WARN_INTERVAL_SEC = 10 >>>>>> Message: Still draining socket... perhaps remote host is not running >>>>>> under DMTCP? >>>>>> ======= >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Edson >>>>>> ------- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> Dmtcp-forum mailing list >>>>>> Dmtcp-forum@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>>>>> >>>>>> >>>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Dmtcp-forum mailing list >>>> Dmtcp-forum@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>>> >>>> >>> >> >
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum