Hi Jiajun,

I haven't shut down the connection to the outside world when threads
suspend. I've built a plugin with the dmtcp_event_hook funtion, but I
haven't put any application logic in dmtcp_event_hook function. I am using
libevent[1] to connect my MPI processes to the outside world. That is,
libevent is responsible for managing the TCP connection. Maybe it happened
because libevent...

Anyway, things are working now. If I have more problems I will let you know.

Thanks a lot again!

[1] http://libevent.org/

Edson

2015-10-31 22:12 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>:

> Hi Edson,
>
> I suppose you wrote the plugin based on what I suggested earlier, i.e.,
> shutting down the connection to the outside world when threads suspend.
> Your plugin library is preloaded by dmtcp, and the behavior at each event (
> DMTCP_EVENT_THREADS_SUSPEND, DMTCP_EVENT_WRITE_CKPT, etc.) will be called
> sequentially. In your case, at the event DMTCP_EVENT_THREADS_SUSPEND,
> your plugin will shut down the tcp connection, so that dmtcp doesn't need
> to handle it at later events (since it's closed, or disconnected).
> Otherwise, for a connected socket, dmtcp will try to drain data from both
> sides of the connection, but the other side (the outside world) is not
> controlled under dmtcp.
>
> In fact, many internal functionalities of dmtcp are implemented using
> plugins. This makes the implementation more loosely coupled, and easier for
> end users to write their own plugin. We're very happy to help you on this,
> and we'd like to take any advices from you about the improvement of the
> plugin API, or any difficulties understanding/writing your own plugin.
>
>
> Best,
> Jiajun
>
> On Sat, Oct 31, 2015 at 11:53 AM, Edson Tavares de Camargo <
> etcamarg...@gmail.com> wrote:
>
>> Hi Everyone
>>
>> Actually my problem was solved after I've built a simple plugin and I've
>> run it  with dmtcp_lauch. The DMTCP has managed the TCP connection of my
>> MPI application with my external application (that is inside my cluster)
>> only if the plugin is run together the dmtcp_lauch. For example:
>>
>> - dmtcp_launch   mpirun ...
>>
>> It doesn't work. The dmtcp doesn't managed to drain the buffers.
>>
>> - dmtcp_launch --with-plugin plugin.so  mpirun ...
>>
>> It works fine!
>>
>> Could you explain me why it works with the plugin and doesn't work
>> without the plugin?
>>
>> Thanks!
>>
>> Edson
>>
>> 2015-10-29 2:44 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>:
>>
>>> Hi Edson,
>>>
>>> The best way to achieve this is to write a tiny plugin, and in
>>> dmtcp_event_hook(), do the shutting down connection job at 
>>> DMTCP_EVENT_THREADS_SUSPEND.
>>> Your application should expose an API to do this. Or, you can define a week
>>> symbol, and use dlsym() and RTLD_NEXT to find the symbol in your app.
>>>
>>> Best,
>>> Jiajun
>>>
>>> On Wed, Oct 28, 2015 at 2:12 PM, Edson Tavares de Camargo <
>>> etcamarg...@gmail.com> wrote:
>>>
>>>> Hi, Jiajun
>>>>
>>>> > I guess what you want to do is to make sure the connection to the
>>>> outside world is shut down before DMTCP handles the network connection.
>>>>
>>>> Yes, that is exactly what I would like to do.
>>>>
>>>> > If that is the case, you should  use the event
>>>> DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at event
>>>> DMTCP_EVENT_THREADS_RESUME.
>>>>
>>>> But my question now is how I can get the event
>>>> DMTCP_EVENT_THREADS_SUSPEND inside my application?
>>>>
>>>> I have managed to get the event  DMTCP_EVENT_THREADS_SUSPEND inside a
>>>> DMTCP plugin, but I still can't manage to understand how to make the DMTCP
>>>> plugin send a message to my application. Or even to make the DMTCP pluging
>>>> call a function in my application (asking to shutdown the connection).
>>>>
>>>> It would be very nice if I could have the  dmtcp_event_hook inside my
>>>> application. It is possible to do that? There is a another way to tell to
>>>> my application to shutdown the connection  to the outside world at moment
>>>> before the checkpoint?
>>>>
>>>> Thanks!
>>>>
>>>> Edson
>>>>
>>>>
>>>>
>>>> 2015-10-27 20:45 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>:
>>>>
>>>>> Hi Edson,
>>>>>
>>>>> DMTCP_EVENT_WRITE_CKPT corresponds to the event right at the time of
>>>>> writing the checkpoint images into storage. At this point, the processing
>>>>> of network connection is already finished. I guess what you want to do is
>>>>> to make sure the connection to the outside world is shut down before DMTCP
>>>>> handles the network connection. If that is the case, you should  use the
>>>>> event DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at
>>>>> event DMTCP_EVENT_THREADS_RESUME.
>>>>>
>>>>> Best,
>>>>> Jiajun
>>>>>
>>>>> On Tue, Oct 27, 2015 at 2:48 PM, Edson Tavares de Camargo <
>>>>> etcamarg...@gmail.com> wrote:
>>>>>
>>>>>> Hi Jiajun,
>>>>>>
>>>>>> Thank you for your answer.
>>>>>>
>>>>>> I use a TCP connection between the MPI application and a logger
>>>>>> entity. It is just to store some information about the MPI messages.
>>>>>>
>>>>>> I have realized that when the connection between the MPI application
>>>>>> and the Logger finishes, the DMTCP is able to make the checkpoint.  It
>>>>>> would be possible to finish the connection at moment before the 
>>>>>> checkpoint
>>>>>> and restore it when the checkpoint has finished?
>>>>>>
>>>>>> I am trying to use the dmtcp_event_hook, but the event
>>>>>> DMTCP_EVENT_WRITE_CKPT seems to be called only after the following 
>>>>>> message
>>>>>> warning:
>>>>>>
>>>>>> [42000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval;
>>>>>> REASON='JWARNING(false) failed'
>>>>>>      _dataSockets[i]->socket().sockfd() = 14
>>>>>>      buffer.size() = 0
>>>>>>
>>>>>>      WARN_INTERVAL_SEC = 10
>>>>>> Message: Still draining socket... perhaps remote host is not running
>>>>>> under DMTCP?
>>>>>>
>>>>>> There is a way to capture the event before that message warning?
>>>>>>
>>>>>> Thanks a lot!!!
>>>>>>
>>>>>> Edson
>>>>>> On Oct 26, 2015 5:25 PM, "Jiajun Cao" <jia...@ccs.neu.edu> wrote:
>>>>>>
>>>>>>> Hi Edson,
>>>>>>>
>>>>>>> The error is what's expected. DMTCP considers the computation as a
>>>>>>> whole, i.e., for all processes involved in a computation, they must run
>>>>>>> under DMTCP. Technically, this is because DMTCP must handle the network
>>>>>>> communication. At the time of a checkpoint, DMTCP needs to drain the 
>>>>>>> data
>>>>>>> in the sockets so that there won't be any lost data in-flight. In your
>>>>>>> case, the other side of the socket is not under the control of DMTCP.
>>>>>>>
>>>>>>> Also, if possible, could you tell us what kind of application are
>>>>>>> you running? I haven't tested DMTCP on MPI applications communicating 
>>>>>>> with
>>>>>>> the external world. This can be a good test suite for us.
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Jiajun
>>>>>>>
>>>>>>> On Mon, Oct 26, 2015 at 6:46 AM, Edson Tavares de Camargo <
>>>>>>> etcamarg...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Everyone!
>>>>>>>>
>>>>>>>>
>>>>>>>> I have a question: What is the expected behaviour of DMTCP when I
>>>>>>>> use DMTCP on a MPI application that exchanges messages with another
>>>>>>>> application that is not running on dmtcp_launch?
>>>>>>>>
>>>>>>>> I ask because I have an error when I execute a MPI application that
>>>>>>>> exchanges message via TCP with another application. Both application 
>>>>>>>> are
>>>>>>>> running on my cluster. But I only need to make the checkpoint the MPI
>>>>>>>> application. The error is the following:
>>>>>>>>
>>>>>>>> ========
>>>>>>>> WARNING at kernelbufferdrainer.cpp:120 in onTimeoutInterval;
>>>>>>>> REASON='JWARNING(false) failed'
>>>>>>>>      _dataSockets[i]->socket().sockfd() = 15
>>>>>>>>      buffer.size() = 1059
>>>>>>>>      WARN_INTERVAL_SEC = 10
>>>>>>>> Message: Still draining socket... perhaps remote host is not
>>>>>>>> running under DMTCP?
>>>>>>>> =======
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Edson
>>>>>>>> -------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Dmtcp-forum mailing list
>>>>>>>> Dmtcp-forum@lists.sourceforge.net
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dmtcp-forum mailing list
>>>>>> Dmtcp-forum@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to