Hi Everyone

Actually my problem was solved after I've built a simple plugin and I've
run it  with dmtcp_lauch. The DMTCP has managed the TCP connection of my
MPI application with my external application (that is inside my cluster)
only if the plugin is run together the dmtcp_lauch. For example:

- dmtcp_launch   mpirun ...

It doesn't work. The dmtcp doesn't managed to drain the buffers.

- dmtcp_launch --with-plugin plugin.so  mpirun ...

It works fine!

Could you explain me why it works with the plugin and doesn't work without
the plugin?

Thanks!

Edson

2015-10-29 2:44 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>:

> Hi Edson,
>
> The best way to achieve this is to write a tiny plugin, and in
> dmtcp_event_hook(), do the shutting down connection job at 
> DMTCP_EVENT_THREADS_SUSPEND.
> Your application should expose an API to do this. Or, you can define a week
> symbol, and use dlsym() and RTLD_NEXT to find the symbol in your app.
>
> Best,
> Jiajun
>
> On Wed, Oct 28, 2015 at 2:12 PM, Edson Tavares de Camargo <
> etcamarg...@gmail.com> wrote:
>
>> Hi, Jiajun
>>
>> > I guess what you want to do is to make sure the connection to the
>> outside world is shut down before DMTCP handles the network connection.
>>
>> Yes, that is exactly what I would like to do.
>>
>> > If that is the case, you should  use the event
>> DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at event
>> DMTCP_EVENT_THREADS_RESUME.
>>
>> But my question now is how I can get the event
>> DMTCP_EVENT_THREADS_SUSPEND inside my application?
>>
>> I have managed to get the event  DMTCP_EVENT_THREADS_SUSPEND inside a
>> DMTCP plugin, but I still can't manage to understand how to make the DMTCP
>> plugin send a message to my application. Or even to make the DMTCP pluging
>> call a function in my application (asking to shutdown the connection).
>>
>> It would be very nice if I could have the  dmtcp_event_hook inside my
>> application. It is possible to do that? There is a another way to tell to
>> my application to shutdown the connection  to the outside world at moment
>> before the checkpoint?
>>
>> Thanks!
>>
>> Edson
>>
>>
>>
>> 2015-10-27 20:45 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>:
>>
>>> Hi Edson,
>>>
>>> DMTCP_EVENT_WRITE_CKPT corresponds to the event right at the time of
>>> writing the checkpoint images into storage. At this point, the processing
>>> of network connection is already finished. I guess what you want to do is
>>> to make sure the connection to the outside world is shut down before DMTCP
>>> handles the network connection. If that is the case, you should  use the
>>> event DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at
>>> event DMTCP_EVENT_THREADS_RESUME.
>>>
>>> Best,
>>> Jiajun
>>>
>>> On Tue, Oct 27, 2015 at 2:48 PM, Edson Tavares de Camargo <
>>> etcamarg...@gmail.com> wrote:
>>>
>>>> Hi Jiajun,
>>>>
>>>> Thank you for your answer.
>>>>
>>>> I use a TCP connection between the MPI application and a logger entity.
>>>> It is just to store some information about the MPI messages.
>>>>
>>>> I have realized that when the connection between the MPI application
>>>> and the Logger finishes, the DMTCP is able to make the checkpoint.  It
>>>> would be possible to finish the connection at moment before the checkpoint
>>>> and restore it when the checkpoint has finished?
>>>>
>>>> I am trying to use the dmtcp_event_hook, but the event
>>>> DMTCP_EVENT_WRITE_CKPT seems to be called only after the following message
>>>> warning:
>>>>
>>>> [42000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval;
>>>> REASON='JWARNING(false) failed'
>>>>      _dataSockets[i]->socket().sockfd() = 14
>>>>      buffer.size() = 0
>>>>
>>>>      WARN_INTERVAL_SEC = 10
>>>> Message: Still draining socket... perhaps remote host is not running
>>>> under DMTCP?
>>>>
>>>> There is a way to capture the event before that message warning?
>>>>
>>>> Thanks a lot!!!
>>>>
>>>> Edson
>>>> On Oct 26, 2015 5:25 PM, "Jiajun Cao" <jia...@ccs.neu.edu> wrote:
>>>>
>>>>> Hi Edson,
>>>>>
>>>>> The error is what's expected. DMTCP considers the computation as a
>>>>> whole, i.e., for all processes involved in a computation, they must run
>>>>> under DMTCP. Technically, this is because DMTCP must handle the network
>>>>> communication. At the time of a checkpoint, DMTCP needs to drain the data
>>>>> in the sockets so that there won't be any lost data in-flight. In your
>>>>> case, the other side of the socket is not under the control of DMTCP.
>>>>>
>>>>> Also, if possible, could you tell us what kind of application are you
>>>>> running? I haven't tested DMTCP on MPI applications communicating with the
>>>>> external world. This can be a good test suite for us.
>>>>>
>>>>>
>>>>> Best,
>>>>> Jiajun
>>>>>
>>>>> On Mon, Oct 26, 2015 at 6:46 AM, Edson Tavares de Camargo <
>>>>> etcamarg...@gmail.com> wrote:
>>>>>
>>>>>> Hi Everyone!
>>>>>>
>>>>>>
>>>>>> I have a question: What is the expected behaviour of DMTCP when I use
>>>>>> DMTCP on a MPI application that exchanges messages with another 
>>>>>> application
>>>>>> that is not running on dmtcp_launch?
>>>>>>
>>>>>> I ask because I have an error when I execute a MPI application that
>>>>>> exchanges message via TCP with another application. Both application are
>>>>>> running on my cluster. But I only need to make the checkpoint the MPI
>>>>>> application. The error is the following:
>>>>>>
>>>>>> ========
>>>>>> WARNING at kernelbufferdrainer.cpp:120 in onTimeoutInterval;
>>>>>> REASON='JWARNING(false) failed'
>>>>>>      _dataSockets[i]->socket().sockfd() = 15
>>>>>>      buffer.size() = 1059
>>>>>>      WARN_INTERVAL_SEC = 10
>>>>>> Message: Still draining socket... perhaps remote host is not running
>>>>>> under DMTCP?
>>>>>> =======
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Edson
>>>>>> -------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dmtcp-forum mailing list
>>>>>> Dmtcp-forum@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Dmtcp-forum mailing list
>>>> Dmtcp-forum@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to