Hi Edson,

The best way to achieve this is to write a tiny plugin, and in
dmtcp_event_hook(), do the shutting down connection job at
DMTCP_EVENT_THREADS_SUSPEND.
Your application should expose an API to do this. Or, you can define a week
symbol, and use dlsym() and RTLD_NEXT to find the symbol in your app.

Best,
Jiajun

On Wed, Oct 28, 2015 at 2:12 PM, Edson Tavares de Camargo <
etcamarg...@gmail.com> wrote:

> Hi, Jiajun
>
> > I guess what you want to do is to make sure the connection to the
> outside world is shut down before DMTCP handles the network connection.
>
> Yes, that is exactly what I would like to do.
>
> > If that is the case, you should  use the event
> DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at event
> DMTCP_EVENT_THREADS_RESUME.
>
> But my question now is how I can get the event DMTCP_EVENT_THREADS_SUSPEND
> inside my application?
>
> I have managed to get the event  DMTCP_EVENT_THREADS_SUSPEND inside a
> DMTCP plugin, but I still can't manage to understand how to make the DMTCP
> plugin send a message to my application. Or even to make the DMTCP pluging
> call a function in my application (asking to shutdown the connection).
>
> It would be very nice if I could have the  dmtcp_event_hook inside my
> application. It is possible to do that? There is a another way to tell to
> my application to shutdown the connection  to the outside world at moment
> before the checkpoint?
>
> Thanks!
>
> Edson
>
>
>
> 2015-10-27 20:45 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>:
>
>> Hi Edson,
>>
>> DMTCP_EVENT_WRITE_CKPT corresponds to the event right at the time of
>> writing the checkpoint images into storage. At this point, the processing
>> of network connection is already finished. I guess what you want to do is
>> to make sure the connection to the outside world is shut down before DMTCP
>> handles the network connection. If that is the case, you should  use the
>> event DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at
>> event DMTCP_EVENT_THREADS_RESUME.
>>
>> Best,
>> Jiajun
>>
>> On Tue, Oct 27, 2015 at 2:48 PM, Edson Tavares de Camargo <
>> etcamarg...@gmail.com> wrote:
>>
>>> Hi Jiajun,
>>>
>>> Thank you for your answer.
>>>
>>> I use a TCP connection between the MPI application and a logger entity.
>>> It is just to store some information about the MPI messages.
>>>
>>> I have realized that when the connection between the MPI application and
>>> the Logger finishes, the DMTCP is able to make the checkpoint.  It would be
>>> possible to finish the connection at moment before the checkpoint and
>>> restore it when the checkpoint has finished?
>>>
>>> I am trying to use the dmtcp_event_hook, but the event
>>> DMTCP_EVENT_WRITE_CKPT seems to be called only after the following message
>>> warning:
>>>
>>> [42000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval;
>>> REASON='JWARNING(false) failed'
>>>      _dataSockets[i]->socket().sockfd() = 14
>>>      buffer.size() = 0
>>>
>>>      WARN_INTERVAL_SEC = 10
>>> Message: Still draining socket... perhaps remote host is not running
>>> under DMTCP?
>>>
>>> There is a way to capture the event before that message warning?
>>>
>>> Thanks a lot!!!
>>>
>>> Edson
>>> On Oct 26, 2015 5:25 PM, "Jiajun Cao" <jia...@ccs.neu.edu> wrote:
>>>
>>>> Hi Edson,
>>>>
>>>> The error is what's expected. DMTCP considers the computation as a
>>>> whole, i.e., for all processes involved in a computation, they must run
>>>> under DMTCP. Technically, this is because DMTCP must handle the network
>>>> communication. At the time of a checkpoint, DMTCP needs to drain the data
>>>> in the sockets so that there won't be any lost data in-flight. In your
>>>> case, the other side of the socket is not under the control of DMTCP.
>>>>
>>>> Also, if possible, could you tell us what kind of application are you
>>>> running? I haven't tested DMTCP on MPI applications communicating with the
>>>> external world. This can be a good test suite for us.
>>>>
>>>>
>>>> Best,
>>>> Jiajun
>>>>
>>>> On Mon, Oct 26, 2015 at 6:46 AM, Edson Tavares de Camargo <
>>>> etcamarg...@gmail.com> wrote:
>>>>
>>>>> Hi Everyone!
>>>>>
>>>>>
>>>>> I have a question: What is the expected behaviour of DMTCP when I use
>>>>> DMTCP on a MPI application that exchanges messages with another 
>>>>> application
>>>>> that is not running on dmtcp_launch?
>>>>>
>>>>> I ask because I have an error when I execute a MPI application that
>>>>> exchanges message via TCP with another application. Both application are
>>>>> running on my cluster. But I only need to make the checkpoint the MPI
>>>>> application. The error is the following:
>>>>>
>>>>> ========
>>>>> WARNING at kernelbufferdrainer.cpp:120 in onTimeoutInterval;
>>>>> REASON='JWARNING(false) failed'
>>>>>      _dataSockets[i]->socket().sockfd() = 15
>>>>>      buffer.size() = 1059
>>>>>      WARN_INTERVAL_SEC = 10
>>>>> Message: Still draining socket... perhaps remote host is not running
>>>>> under DMTCP?
>>>>> =======
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Edson
>>>>> -------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Dmtcp-forum mailing list
>>>>> Dmtcp-forum@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>>>
>>>>>
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Dmtcp-forum mailing list
>>> Dmtcp-forum@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>
>>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to