Hi Edson, I suppose you wrote the plugin based on what I suggested earlier, i.e., shutting down the connection to the outside world when threads suspend. Your plugin library is preloaded by dmtcp, and the behavior at each event ( DMTCP_EVENT_THREADS_SUSPEND, DMTCP_EVENT_WRITE_CKPT, etc.) will be called sequentially. In your case, at the event DMTCP_EVENT_THREADS_SUSPEND, your plugin will shut down the tcp connection, so that dmtcp doesn't need to handle it at later events (since it's closed, or disconnected). Otherwise, for a connected socket, dmtcp will try to drain data from both sides of the connection, but the other side (the outside world) is not controlled under dmtcp.
In fact, many internal functionalities of dmtcp are implemented using plugins. This makes the implementation more loosely coupled, and easier for end users to write their own plugin. We're very happy to help you on this, and we'd like to take any advices from you about the improvement of the plugin API, or any difficulties understanding/writing your own plugin. Best, Jiajun On Sat, Oct 31, 2015 at 11:53 AM, Edson Tavares de Camargo < etcamarg...@gmail.com> wrote: > Hi Everyone > > Actually my problem was solved after I've built a simple plugin and I've > run it with dmtcp_lauch. The DMTCP has managed the TCP connection of my > MPI application with my external application (that is inside my cluster) > only if the plugin is run together the dmtcp_lauch. For example: > > - dmtcp_launch mpirun ... > > It doesn't work. The dmtcp doesn't managed to drain the buffers. > > - dmtcp_launch --with-plugin plugin.so mpirun ... > > It works fine! > > Could you explain me why it works with the plugin and doesn't work without > the plugin? > > Thanks! > > Edson > > 2015-10-29 2:44 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>: > >> Hi Edson, >> >> The best way to achieve this is to write a tiny plugin, and in >> dmtcp_event_hook(), do the shutting down connection job at >> DMTCP_EVENT_THREADS_SUSPEND. >> Your application should expose an API to do this. Or, you can define a week >> symbol, and use dlsym() and RTLD_NEXT to find the symbol in your app. >> >> Best, >> Jiajun >> >> On Wed, Oct 28, 2015 at 2:12 PM, Edson Tavares de Camargo < >> etcamarg...@gmail.com> wrote: >> >>> Hi, Jiajun >>> >>> > I guess what you want to do is to make sure the connection to the >>> outside world is shut down before DMTCP handles the network connection. >>> >>> Yes, that is exactly what I would like to do. >>> >>> > If that is the case, you should use the event >>> DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at event >>> DMTCP_EVENT_THREADS_RESUME. >>> >>> But my question now is how I can get the event >>> DMTCP_EVENT_THREADS_SUSPEND inside my application? >>> >>> I have managed to get the event DMTCP_EVENT_THREADS_SUSPEND inside a >>> DMTCP plugin, but I still can't manage to understand how to make the DMTCP >>> plugin send a message to my application. Or even to make the DMTCP pluging >>> call a function in my application (asking to shutdown the connection). >>> >>> It would be very nice if I could have the dmtcp_event_hook inside my >>> application. It is possible to do that? There is a another way to tell to >>> my application to shutdown the connection to the outside world at moment >>> before the checkpoint? >>> >>> Thanks! >>> >>> Edson >>> >>> >>> >>> 2015-10-27 20:45 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>: >>> >>>> Hi Edson, >>>> >>>> DMTCP_EVENT_WRITE_CKPT corresponds to the event right at the time of >>>> writing the checkpoint images into storage. At this point, the processing >>>> of network connection is already finished. I guess what you want to do is >>>> to make sure the connection to the outside world is shut down before DMTCP >>>> handles the network connection. If that is the case, you should use the >>>> event DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at >>>> event DMTCP_EVENT_THREADS_RESUME. >>>> >>>> Best, >>>> Jiajun >>>> >>>> On Tue, Oct 27, 2015 at 2:48 PM, Edson Tavares de Camargo < >>>> etcamarg...@gmail.com> wrote: >>>> >>>>> Hi Jiajun, >>>>> >>>>> Thank you for your answer. >>>>> >>>>> I use a TCP connection between the MPI application and a logger >>>>> entity. It is just to store some information about the MPI messages. >>>>> >>>>> I have realized that when the connection between the MPI application >>>>> and the Logger finishes, the DMTCP is able to make the checkpoint. It >>>>> would be possible to finish the connection at moment before the checkpoint >>>>> and restore it when the checkpoint has finished? >>>>> >>>>> I am trying to use the dmtcp_event_hook, but the event >>>>> DMTCP_EVENT_WRITE_CKPT seems to be called only after the following message >>>>> warning: >>>>> >>>>> [42000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval; >>>>> REASON='JWARNING(false) failed' >>>>> _dataSockets[i]->socket().sockfd() = 14 >>>>> buffer.size() = 0 >>>>> >>>>> WARN_INTERVAL_SEC = 10 >>>>> Message: Still draining socket... perhaps remote host is not running >>>>> under DMTCP? >>>>> >>>>> There is a way to capture the event before that message warning? >>>>> >>>>> Thanks a lot!!! >>>>> >>>>> Edson >>>>> On Oct 26, 2015 5:25 PM, "Jiajun Cao" <jia...@ccs.neu.edu> wrote: >>>>> >>>>>> Hi Edson, >>>>>> >>>>>> The error is what's expected. DMTCP considers the computation as a >>>>>> whole, i.e., for all processes involved in a computation, they must run >>>>>> under DMTCP. Technically, this is because DMTCP must handle the network >>>>>> communication. At the time of a checkpoint, DMTCP needs to drain the data >>>>>> in the sockets so that there won't be any lost data in-flight. In your >>>>>> case, the other side of the socket is not under the control of DMTCP. >>>>>> >>>>>> Also, if possible, could you tell us what kind of application are you >>>>>> running? I haven't tested DMTCP on MPI applications communicating with >>>>>> the >>>>>> external world. This can be a good test suite for us. >>>>>> >>>>>> >>>>>> Best, >>>>>> Jiajun >>>>>> >>>>>> On Mon, Oct 26, 2015 at 6:46 AM, Edson Tavares de Camargo < >>>>>> etcamarg...@gmail.com> wrote: >>>>>> >>>>>>> Hi Everyone! >>>>>>> >>>>>>> >>>>>>> I have a question: What is the expected behaviour of DMTCP when I >>>>>>> use DMTCP on a MPI application that exchanges messages with another >>>>>>> application that is not running on dmtcp_launch? >>>>>>> >>>>>>> I ask because I have an error when I execute a MPI application that >>>>>>> exchanges message via TCP with another application. Both application are >>>>>>> running on my cluster. But I only need to make the checkpoint the MPI >>>>>>> application. The error is the following: >>>>>>> >>>>>>> ======== >>>>>>> WARNING at kernelbufferdrainer.cpp:120 in onTimeoutInterval; >>>>>>> REASON='JWARNING(false) failed' >>>>>>> _dataSockets[i]->socket().sockfd() = 15 >>>>>>> buffer.size() = 1059 >>>>>>> WARN_INTERVAL_SEC = 10 >>>>>>> Message: Still draining socket... perhaps remote host is not running >>>>>>> under DMTCP? >>>>>>> ======= >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Edson >>>>>>> ------- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Dmtcp-forum mailing list >>>>>>> Dmtcp-forum@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Dmtcp-forum mailing list >>>>> Dmtcp-forum@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>>>> >>>>> >>>> >>> >> >
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum