Hi Jiajun,

Thank you for your answer.

I use a TCP connection between the MPI application and a logger entity. It
is just to store some information about the MPI messages.

I have realized that when the connection between the MPI application and
the Logger finishes, the DMTCP is able to make the checkpoint.  It would be
possible to finish the connection at moment before the checkpoint and
restore it when the checkpoint has finished?

I am trying to use the dmtcp_event_hook, but the event
DMTCP_EVENT_WRITE_CKPT seems to be called only after the following message
warning:

[42000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval;
REASON='JWARNING(false) failed'
     _dataSockets[i]->socket().sockfd() = 14
     buffer.size() = 0

     WARN_INTERVAL_SEC = 10
Message: Still draining socket... perhaps remote host is not running under
DMTCP?

There is a way to capture the event before that message warning?

Thanks a lot!!!

Edson
On Oct 26, 2015 5:25 PM, "Jiajun Cao" <jia...@ccs.neu.edu> wrote:

> Hi Edson,
>
> The error is what's expected. DMTCP considers the computation as a whole,
> i.e., for all processes involved in a computation, they must run under
> DMTCP. Technically, this is because DMTCP must handle the network
> communication. At the time of a checkpoint, DMTCP needs to drain the data
> in the sockets so that there won't be any lost data in-flight. In your
> case, the other side of the socket is not under the control of DMTCP.
>
> Also, if possible, could you tell us what kind of application are you
> running? I haven't tested DMTCP on MPI applications communicating with the
> external world. This can be a good test suite for us.
>
>
> Best,
> Jiajun
>
> On Mon, Oct 26, 2015 at 6:46 AM, Edson Tavares de Camargo <
> etcamarg...@gmail.com> wrote:
>
>> Hi Everyone!
>>
>>
>> I have a question: What is the expected behaviour of DMTCP when I use
>> DMTCP on a MPI application that exchanges messages with another application
>> that is not running on dmtcp_launch?
>>
>> I ask because I have an error when I execute a MPI application that
>> exchanges message via TCP with another application. Both application are
>> running on my cluster. But I only need to make the checkpoint the MPI
>> application. The error is the following:
>>
>> ========
>> WARNING at kernelbufferdrainer.cpp:120 in onTimeoutInterval;
>> REASON='JWARNING(false) failed'
>>      _dataSockets[i]->socket().sockfd() = 15
>>      buffer.size() = 1059
>>      WARN_INTERVAL_SEC = 10
>> Message: Still draining socket... perhaps remote host is not running
>> under DMTCP?
>> =======
>>
>> Thanks!
>>
>> Edson
>> -------
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Dmtcp-forum mailing list
>> Dmtcp-forum@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to