Hi Edson,

For coordinator-less checkpointing, I would suggest that you use the
"--no-coordinator" flag with dmtcp_launch. This allows you to specify an
checkpoint interval. Further, you can also provide a port number with
"--port" and then use dmtcp_command to request checkpoints explicitly.

Please let us know if you would like any more help with setting up your
experiments.

Best,
Kapil

On Wed, Oct 7, 2015 at 1:06 PM, Edson Tavares de Camargo <
etcamarg...@gmail.com> wrote:

> Hi Everyone!
>
> I have made some tests here with the version 2.4.1 and 1.2.8. I have
> managed to make the checkpoint and the restart of a simple mpi aplication
> running on my machine (basically each process increments a number). DMTCP
> is a very nice tool!
>
> Nausca, I have some questions:
>
> > You have to find old version of dmtcp (1.x). At that version, no
> coordinator is required.
>
> I have tested the version 1.2.8, but a coordinator is still launched at
> background. Do you know what is exactly the version that doesn't need a
> coordinator?
>
> > I am working on this now.
> Are you working with DMTCP and MPI?
>
> > To make latest version run as a single process and no coordinator needed.
> > And no dmtcp_launch needed neither.
> > But in this case, you have to link your source code with dmtcp library
> so files.
>
> I'm afraid I don't understand what you are saying. Could you give me a
> sample?
>
> > If you just want to checkpoint each process, not every process.
> > Maybe you can run a coordinator for each dmtcp launch and set the
> environment variable to different coordinator.
>
> I would like to launch my mpiexec command, for sample: mpiexec -np 8
> ./test and each one of  8 processes creates a checkpoint, but with no
> coordination.
>
> Thanks a lot!!!
>
> Edson
>
>
> 2015-10-07 12:07 GMT+02:00 Nausca Hsu <nau...@cadence.com>:
>
>> Hi Edson,
>> Back in the old days,
>> Dmtcp is link to user application.
>> Use a signal handler to trigger checkpoint.
>> A checkpoint thread is created to handle the checkpoint.
>> So there is no need of coordinator.
>>
>> In this latest version, I am afraid you need a coordinator anyway,
>> If you don’t run the coordinator, dmtcp_launch will automatically bring
>> up a coordinator for you.
>> This is the current behavior of 2.4.1
>>
>> If you just want to checkpoint each process, not every process.
>> Maybe you can run a coordinator for each dmtcp_launch and set the
>> environment variable to different coordinator.
>>
>> Thanks.
>> Nausca.
>>
>>
>> From: Edson Tavares de Camargo <etcamarg...@gmail.com>
>> Date: 2015年10月7日 星期三 17:54
>> To: Nausca <nau...@cadence.com>
>> Cc: "Sourceforge. Net Dmtcp-Forum@Lists." <
>> dmtcp-forum@lists.sourceforge.net>
>> Subject: Re: [Dmtcp-forum] Uncoordinated checkpoint for MPI
>>
>> Hi Nausca,
>>
>> Thank you for your reply!
>>
>> Let me see if I understood correctly. Using an older version (1.x) my
>> system will be capable of to create non-coordinated checkpoints among
>> processes. Then, if I run:
>>
>> - <dmtcp command> mpirun -np 8 ./test - where each process executes on a
>> different machine
>>
>> I will have each one of that process creating a checkpoint, ok?
>>
>> > But in this case, you have to link your source code with dmtcp library
>> so files.
>>
>> How could I do that? I will have to use the function dmtcp Checkpoint()
>> into the application code?
>>
>> Thanks a lot!
>>
>> Edson
>>
>> 2015-10-07 11:28 GMT+02:00 Nausca Hsu <nau...@cadence.com>:
>>
>>> Hi,
>>> You have to find old version of dmtcp (1.x). At that version, no
>>> coordinator is required.
>>> I am working on this now.
>>> To make latest version run as a single process and no coordinator needed.
>>> And no dmtcp_launch needed neither.
>>>
>>> But in this case, you have to link your source code with dmtcp library
>>> so files.
>>>
>>> Thanks.
>>> Nausca.
>>>
>>> From: Edson Tavares de Camargo <etcamarg...@gmail.com>
>>> Date: 2015年10月7日 星期三 16:32
>>> To: "Sourceforge. Net Dmtcp-Forum@Lists." <
>>> dmtcp-forum@lists.sourceforge.net>
>>> Subject: [Dmtcp-forum] Uncoordinated checkpoint for MPI
>>>
>>> Hi Everyone!
>>>
>>> This is my first contact with DMTCP. I'm a phd student and I'm working
>>> on a message logging protocol for MPI. I'm using OpenMPI for implementing
>>> my proposal. I have read the DMTCP documentation and I have few questions.
>>> But first of all, I will tell you why I would like to use a checkpoint tool:
>>>
>>> - My message logging protocol supposes that processes create checkpoints
>>> on a uncoordinated approach. Each process creates a checkpoint
>>> independently of other. There will be no coordination among the processes.
>>>
>>> - For now, I am not worried about a  process recovery. This will be part
>>> of a next phase of my work.
>>>
>>> Now my questions about DMTCP.
>>>
>>> - There is a coordinator. It is responsible for starting the checkpoints
>>> on the other processes, right? DMTCP follows a coordinated checkpoint
>>> approach and creates a consistent global state, ok?
>>>
>>> - Would be possible to use DMTCP, or DMTCP plugin, in order to implement
>>> a uncoordinated checkpoint? In this moment just take checkpoint
>>> independently on each process.
>>>
>>> Thank you in advance!
>>>
>>> Edson
>>>
>>
>>
>
>
> ------------------------------------------------------------------------------
> Full-scale, agent-less Infrastructure Monitoring from a single dashboard
> Integrate with 40+ ManageEngine ITSM Solutions for complete visibility
> Physical-Virtual-Cloud Infrastructure monitoring from one console
> Real user monitoring with APM Insights and performance trend reports
> Learn More
> http://pubads.g.doubleclick.net/gampad/clk?id=247754911&iu=/4140
> _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>
>
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to