Hi Edson, For coordinator-less checkpointing, I would suggest that you use the "--no-coordinator" flag with dmtcp_launch. This allows you to specify an checkpoint interval. Further, you can also provide a port number with "--port" and then use dmtcp_command to request checkpoints explicitly.
Please let us know if you would like any more help with setting up your experiments. Best, Kapil On Wed, Oct 7, 2015 at 1:06 PM, Edson Tavares de Camargo < etcamarg...@gmail.com> wrote: > Hi Everyone! > > I have made some tests here with the version 2.4.1 and 1.2.8. I have > managed to make the checkpoint and the restart of a simple mpi aplication > running on my machine (basically each process increments a number). DMTCP > is a very nice tool! > > Nausca, I have some questions: > > > You have to find old version of dmtcp (1.x). At that version, no > coordinator is required. > > I have tested the version 1.2.8, but a coordinator is still launched at > background. Do you know what is exactly the version that doesn't need a > coordinator? > > > I am working on this now. > Are you working with DMTCP and MPI? > > > To make latest version run as a single process and no coordinator needed. > > And no dmtcp_launch needed neither. > > But in this case, you have to link your source code with dmtcp library > so files. > > I'm afraid I don't understand what you are saying. Could you give me a > sample? > > > If you just want to checkpoint each process, not every process. > > Maybe you can run a coordinator for each dmtcp launch and set the > environment variable to different coordinator. > > I would like to launch my mpiexec command, for sample: mpiexec -np 8 > ./test and each one of 8 processes creates a checkpoint, but with no > coordination. > > Thanks a lot!!! > > Edson > > > 2015-10-07 12:07 GMT+02:00 Nausca Hsu <nau...@cadence.com>: > >> Hi Edson, >> Back in the old days, >> Dmtcp is link to user application. >> Use a signal handler to trigger checkpoint. >> A checkpoint thread is created to handle the checkpoint. >> So there is no need of coordinator. >> >> In this latest version, I am afraid you need a coordinator anyway, >> If you don’t run the coordinator, dmtcp_launch will automatically bring >> up a coordinator for you. >> This is the current behavior of 2.4.1 >> >> If you just want to checkpoint each process, not every process. >> Maybe you can run a coordinator for each dmtcp_launch and set the >> environment variable to different coordinator. >> >> Thanks. >> Nausca. >> >> >> From: Edson Tavares de Camargo <etcamarg...@gmail.com> >> Date: 2015年10月7日 星期三 17:54 >> To: Nausca <nau...@cadence.com> >> Cc: "Sourceforge. Net Dmtcp-Forum@Lists." < >> dmtcp-forum@lists.sourceforge.net> >> Subject: Re: [Dmtcp-forum] Uncoordinated checkpoint for MPI >> >> Hi Nausca, >> >> Thank you for your reply! >> >> Let me see if I understood correctly. Using an older version (1.x) my >> system will be capable of to create non-coordinated checkpoints among >> processes. Then, if I run: >> >> - <dmtcp command> mpirun -np 8 ./test - where each process executes on a >> different machine >> >> I will have each one of that process creating a checkpoint, ok? >> >> > But in this case, you have to link your source code with dmtcp library >> so files. >> >> How could I do that? I will have to use the function dmtcp Checkpoint() >> into the application code? >> >> Thanks a lot! >> >> Edson >> >> 2015-10-07 11:28 GMT+02:00 Nausca Hsu <nau...@cadence.com>: >> >>> Hi, >>> You have to find old version of dmtcp (1.x). At that version, no >>> coordinator is required. >>> I am working on this now. >>> To make latest version run as a single process and no coordinator needed. >>> And no dmtcp_launch needed neither. >>> >>> But in this case, you have to link your source code with dmtcp library >>> so files. >>> >>> Thanks. >>> Nausca. >>> >>> From: Edson Tavares de Camargo <etcamarg...@gmail.com> >>> Date: 2015年10月7日 星期三 16:32 >>> To: "Sourceforge. Net Dmtcp-Forum@Lists." < >>> dmtcp-forum@lists.sourceforge.net> >>> Subject: [Dmtcp-forum] Uncoordinated checkpoint for MPI >>> >>> Hi Everyone! >>> >>> This is my first contact with DMTCP. I'm a phd student and I'm working >>> on a message logging protocol for MPI. I'm using OpenMPI for implementing >>> my proposal. I have read the DMTCP documentation and I have few questions. >>> But first of all, I will tell you why I would like to use a checkpoint tool: >>> >>> - My message logging protocol supposes that processes create checkpoints >>> on a uncoordinated approach. Each process creates a checkpoint >>> independently of other. There will be no coordination among the processes. >>> >>> - For now, I am not worried about a process recovery. This will be part >>> of a next phase of my work. >>> >>> Now my questions about DMTCP. >>> >>> - There is a coordinator. It is responsible for starting the checkpoints >>> on the other processes, right? DMTCP follows a coordinated checkpoint >>> approach and creates a consistent global state, ok? >>> >>> - Would be possible to use DMTCP, or DMTCP plugin, in order to implement >>> a uncoordinated checkpoint? In this moment just take checkpoint >>> independently on each process. >>> >>> Thank you in advance! >>> >>> Edson >>> >> >> > > > ------------------------------------------------------------------------------ > Full-scale, agent-less Infrastructure Monitoring from a single dashboard > Integrate with 40+ ManageEngine ITSM Solutions for complete visibility > Physical-Virtual-Cloud Infrastructure monitoring from one console > Real user monitoring with APM Insights and performance trend reports > Learn More > http://pubads.g.doubleclick.net/gampad/clk?id=247754911&iu=/4140 > _______________________________________________ > Dmtcp-forum mailing list > Dmtcp-forum@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum > >
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum