Hi Everyone!

I have made some tests here with the version 2.4.1 and 1.2.8. I have
managed to make the checkpoint and the restart of a simple mpi aplication
running on my machine (basically each process increments a number). DMTCP
is a very nice tool!

Nausca, I have some questions:

> You have to find old version of dmtcp (1.x). At that version, no
coordinator is required.

I have tested the version 1.2.8, but a coordinator is still launched at
background. Do you know what is exactly the version that doesn't need a
coordinator?

> I am working on this now.
Are you working with DMTCP and MPI?

> To make latest version run as a single process and no coordinator needed.
> And no dmtcp_launch needed neither.
> But in this case, you have to link your source code with dmtcp library so
files.

I'm afraid I don't understand what you are saying. Could you give me a
sample?

> If you just want to checkpoint each process, not every process.
> Maybe you can run a coordinator for each dmtcp launch and set the
environment variable to different coordinator.

I would like to launch my mpiexec command, for sample: mpiexec -np 8 ./test
and each one of  8 processes creates a checkpoint, but with no coordination.

Thanks a lot!!!

Edson


2015-10-07 12:07 GMT+02:00 Nausca Hsu <nau...@cadence.com>:

> Hi Edson,
> Back in the old days,
> Dmtcp is link to user application.
> Use a signal handler to trigger checkpoint.
> A checkpoint thread is created to handle the checkpoint.
> So there is no need of coordinator.
>
> In this latest version, I am afraid you need a coordinator anyway,
> If you don’t run the coordinator, dmtcp_launch will automatically bring up
> a coordinator for you.
> This is the current behavior of 2.4.1
>
> If you just want to checkpoint each process, not every process.
> Maybe you can run a coordinator for each dmtcp_launch and set the
> environment variable to different coordinator.
>
> Thanks.
> Nausca.
>
>
> From: Edson Tavares de Camargo <etcamarg...@gmail.com>
> Date: 2015年10月7日 星期三 17:54
> To: Nausca <nau...@cadence.com>
> Cc: "Sourceforge. Net Dmtcp-Forum@Lists." <
> dmtcp-forum@lists.sourceforge.net>
> Subject: Re: [Dmtcp-forum] Uncoordinated checkpoint for MPI
>
> Hi Nausca,
>
> Thank you for your reply!
>
> Let me see if I understood correctly. Using an older version (1.x) my
> system will be capable of to create non-coordinated checkpoints among
> processes. Then, if I run:
>
> - <dmtcp command> mpirun -np 8 ./test - where each process executes on a
> different machine
>
> I will have each one of that process creating a checkpoint, ok?
>
> > But in this case, you have to link your source code with dmtcp library
> so files.
>
> How could I do that? I will have to use the function dmtcp Checkpoint()
> into the application code?
>
> Thanks a lot!
>
> Edson
>
> 2015-10-07 11:28 GMT+02:00 Nausca Hsu <nau...@cadence.com>:
>
>> Hi,
>> You have to find old version of dmtcp (1.x). At that version, no
>> coordinator is required.
>> I am working on this now.
>> To make latest version run as a single process and no coordinator needed.
>> And no dmtcp_launch needed neither.
>>
>> But in this case, you have to link your source code with dmtcp library so
>> files.
>>
>> Thanks.
>> Nausca.
>>
>> From: Edson Tavares de Camargo <etcamarg...@gmail.com>
>> Date: 2015年10月7日 星期三 16:32
>> To: "Sourceforge. Net Dmtcp-Forum@Lists." <
>> dmtcp-forum@lists.sourceforge.net>
>> Subject: [Dmtcp-forum] Uncoordinated checkpoint for MPI
>>
>> Hi Everyone!
>>
>> This is my first contact with DMTCP. I'm a phd student and I'm working on
>> a message logging protocol for MPI. I'm using OpenMPI for implementing my
>> proposal. I have read the DMTCP documentation and I have few questions. But
>> first of all, I will tell you why I would like to use a checkpoint tool:
>>
>> - My message logging protocol supposes that processes create checkpoints
>> on a uncoordinated approach. Each process creates a checkpoint
>> independently of other. There will be no coordination among the processes.
>>
>> - For now, I am not worried about a  process recovery. This will be part
>> of a next phase of my work.
>>
>> Now my questions about DMTCP.
>>
>> - There is a coordinator. It is responsible for starting the checkpoints
>> on the other processes, right? DMTCP follows a coordinated checkpoint
>> approach and creates a consistent global state, ok?
>>
>> - Would be possible to use DMTCP, or DMTCP plugin, in order to implement
>> a uncoordinated checkpoint? In this moment just take checkpoint
>> independently on each process.
>>
>> Thank you in advance!
>>
>> Edson
>>
>
>
------------------------------------------------------------------------------
Full-scale, agent-less Infrastructure Monitoring from a single dashboard
Integrate with 40+ ManageEngine ITSM Solutions for complete visibility
Physical-Virtual-Cloud Infrastructure monitoring from one console
Real user monitoring with APM Insights and performance trend reports 
Learn More http://pubads.g.doubleclick.net/gampad/clk?id=247754911&iu=/4140
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to