Hi, Jiajun

> I guess what you want to do is to make sure the connection to the outside
world is shut down before DMTCP handles the network connection.

Yes, that is exactly what I would like to do.

> If that is the case, you should  use the event
DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at event
DMTCP_EVENT_THREADS_RESUME.

But my question now is how I can get the event DMTCP_EVENT_THREADS_SUSPEND
inside my application?

I have managed to get the event  DMTCP_EVENT_THREADS_SUSPEND inside a DMTCP
plugin, but I still can't manage to understand how to make the DMTCP plugin
send a message to my application. Or even to make the DMTCP pluging call a
function in my application (asking to shutdown the connection).

It would be very nice if I could have the  dmtcp_event_hook inside my
application. It is possible to do that? There is a another way to tell to
my application to shutdown the connection  to the outside world at moment
before the checkpoint?

Thanks!

Edson



2015-10-27 20:45 GMT+01:00 Jiajun Cao <jia...@ccs.neu.edu>:

> Hi Edson,
>
> DMTCP_EVENT_WRITE_CKPT corresponds to the event right at the time of
> writing the checkpoint images into storage. At this point, the processing
> of network connection is already finished. I guess what you want to do is
> to make sure the connection to the outside world is shut down before DMTCP
> handles the network connection. If that is the case, you should  use the
> event DMTCP_EVENT_THREADS_SUSPEND instead, and restore the connection at
> event DMTCP_EVENT_THREADS_RESUME.
>
> Best,
> Jiajun
>
> On Tue, Oct 27, 2015 at 2:48 PM, Edson Tavares de Camargo <
> etcamarg...@gmail.com> wrote:
>
>> Hi Jiajun,
>>
>> Thank you for your answer.
>>
>> I use a TCP connection between the MPI application and a logger entity.
>> It is just to store some information about the MPI messages.
>>
>> I have realized that when the connection between the MPI application and
>> the Logger finishes, the DMTCP is able to make the checkpoint.  It would be
>> possible to finish the connection at moment before the checkpoint and
>> restore it when the checkpoint has finished?
>>
>> I am trying to use the dmtcp_event_hook, but the event
>> DMTCP_EVENT_WRITE_CKPT seems to be called only after the following message
>> warning:
>>
>> [42000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval;
>> REASON='JWARNING(false) failed'
>>      _dataSockets[i]->socket().sockfd() = 14
>>      buffer.size() = 0
>>
>>      WARN_INTERVAL_SEC = 10
>> Message: Still draining socket... perhaps remote host is not running
>> under DMTCP?
>>
>> There is a way to capture the event before that message warning?
>>
>> Thanks a lot!!!
>>
>> Edson
>> On Oct 26, 2015 5:25 PM, "Jiajun Cao" <jia...@ccs.neu.edu> wrote:
>>
>>> Hi Edson,
>>>
>>> The error is what's expected. DMTCP considers the computation as a
>>> whole, i.e., for all processes involved in a computation, they must run
>>> under DMTCP. Technically, this is because DMTCP must handle the network
>>> communication. At the time of a checkpoint, DMTCP needs to drain the data
>>> in the sockets so that there won't be any lost data in-flight. In your
>>> case, the other side of the socket is not under the control of DMTCP.
>>>
>>> Also, if possible, could you tell us what kind of application are you
>>> running? I haven't tested DMTCP on MPI applications communicating with the
>>> external world. This can be a good test suite for us.
>>>
>>>
>>> Best,
>>> Jiajun
>>>
>>> On Mon, Oct 26, 2015 at 6:46 AM, Edson Tavares de Camargo <
>>> etcamarg...@gmail.com> wrote:
>>>
>>>> Hi Everyone!
>>>>
>>>>
>>>> I have a question: What is the expected behaviour of DMTCP when I use
>>>> DMTCP on a MPI application that exchanges messages with another application
>>>> that is not running on dmtcp_launch?
>>>>
>>>> I ask because I have an error when I execute a MPI application that
>>>> exchanges message via TCP with another application. Both application are
>>>> running on my cluster. But I only need to make the checkpoint the MPI
>>>> application. The error is the following:
>>>>
>>>> ========
>>>> WARNING at kernelbufferdrainer.cpp:120 in onTimeoutInterval;
>>>> REASON='JWARNING(false) failed'
>>>>      _dataSockets[i]->socket().sockfd() = 15
>>>>      buffer.size() = 1059
>>>>      WARN_INTERVAL_SEC = 10
>>>> Message: Still draining socket... perhaps remote host is not running
>>>> under DMTCP?
>>>> =======
>>>>
>>>> Thanks!
>>>>
>>>> Edson
>>>> -------
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Dmtcp-forum mailing list
>>>> Dmtcp-forum@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>>
>>>>
>>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Dmtcp-forum mailing list
>> Dmtcp-forum@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to