Hi Andrew,

I have made some progress on vfork support but it'll take a bit more time.
I'll try to get a prelim version over the weekend.

Regarding VDSO errors, can you tell us a bit more about the DMTCP version
you are using?

Best,
Kapil


On Thu, Jan 30, 2020 at 10:45 AM Andrew Lynch <d...@cadence.com> wrote:

> Hi Kapil,
>
> Any further thoughts on our vfork issue?
>
>
>
> We also have another issue.  So far we are only seeing it on varieties of
> CentOS 6.10, but we seem to be having VDSO related issues when we save on
> one version (i.e. 2.6.32-754.14.2.el6.x86_64) and restart on another
> (i.e. 2.6.32-754.3.5.el6.x86_64) In one case it manifests as a crash in
> gettimeofday when it tries to call __vdso_gettimeofday.
>
>
>
> Regards,
>
>    Drew
>
>
>
>
>
> [image: http://www.cadence.com/mail/footer_logocdns2.jpg]
>
> [image: Cadence Cares] <http://fortune.com/best-companies/cadence-52/>
>
> *Andrew T. Lynch*     |    Software Architect
>
> T: 408.914.6875   M: 408.832.1045    www.cadence.com
>
>
>
>
>
> *From: *Andrew Lynch <d...@cadence.com>
> *Date: *Friday, January 17, 2020 at 11:26 AM
> *To: *Kapil Arya <kapil.arya...@gmail.com>
> *Cc: *"dmtcp-forum@lists.sourceforge.net" <
> dmtcp-forum@lists.sourceforge.net>, Rodion Melnikov <rodi...@cadence.com>
> *Subject: *Re: [Dmtcp-forum] vfork usage
>
>
>
> Hi Kapil,
>
>   Thanks for your prompt response.  We have a significant number of vfork
> calls in our system, but not all of them occur when the system is likely to
> be using more than half of all available memory. Additionally, our system
> is mostly single threaded and checkpoints itself, so I don’t think we have
> the risk of a checkpoint arriving “in the middle of a fork”.  As such, I
> think it would be ok to block the other threads while the vfork call is in
> progress.  This is at least true in the one situation we are currently
> addressing.
>
>
>
> I’ve appended a simple program the demonstrates our usage.
>
>
>
> I look forward to your response.
>
>
>
> Regards,
>
>    Drew
>
>
>
> int sn_simple_popen_r_cfn(void *pipes_and_command) {
>
>     void ** pnc = pipes_and_command;
>
>     int pipe_descs[2];
>
>     pipe_descs[0] = (int)(intptr_t)pnc[0];
>
>     pipe_descs[1] = (int)(intptr_t)pnc[1];
>
>     char *command = pnc[2];
>
>
>
>     dup2(pipe_descs[1], STDOUT_FILENO);
>
>
>
>     close(pipe_descs[1]);
>
>     close(pipe_descs[0]);
>
>     execl("/bin/sh", "sh", "-c", command, (char *) NULL);
>
>    _exit(-1); // Should never get here
>
> }
>
>
>
> static int sn_popen_pid;
>
>
>
> FILE *sn_simple_popen_r(char *command) {
>
>     FILE *result;
>
>     int pipe_descs[2];
>
>     void *pipes_and_command[3];
>
>
>
>     const int STACK_SIZE = 1024;
>
>     char* stack[STACK_SIZE];
>
>
>
>     if (pipe(pipe_descs) < 0)
>
>                 return NULL;
>
>
>
>     pipes_and_command[0] = (void*)(intptr_t)pipe_descs[0];
>
>     pipes_and_command[1] = (void*)(intptr_t)pipe_descs[1];
>
>     pipes_and_command[2] = command;
>
>
>
>     if (-1 == (sn_popen_pid = vfork())) {
>
>                 close(pipe_descs[0]);
>
>                 close(pipe_descs[1]);
>
>                 return NULL;
>
>     }
>
>     else if (sn_popen_pid == 0) {
>
>                 sn_simple_popen_r_cfn(pipes_and_command);
>
>     }
>
>     else {
>
>                 result = fdopen(pipe_descs[0], "r");
>
>                 close(pipe_descs[1]);
>
>     }
>
>     return result;
>
> }
>
>
>
> void sn_simple_pclose_r(FILE *p) {
>
>
>
>     int stat = waitpid(sn_popen_pid, NULL, WNOHANG);
>
>
>
>     if (stat == 0) {
>
>                 kill(sn_popen_pid, SIGKILL);
>
>                 while (-1 == waitpid(sn_popen_pid, NULL, 0)) {
>
>                     if (errno != EINTR)
>
>                                 break;
>
>                 }
>
>     }
>
>     sn_popen_pid = 0;
>
>     fclose(p);
>
> }
>
>
>
>
>
>
>
> [image: http://www.cadence.com/mail/footer_logocdns2.jpg]
>
> [image: Cadence Cares] <http://fortune.com/best-companies/cadence-52/>
>
> *Andrew T. Lynch*     |    Software Architect
>
> T: 408.914.6875   M: 408.832.1045    www.cadence.com
>
>
>
>
>
> *From: *Kapil Arya <kapil.arya...@gmail.com>
> *Date: *Thursday, January 16, 2020 at 1:23 PM
> *To: *Andrew Lynch <d...@cadence.com>
> *Cc: *"dmtcp-forum@lists.sourceforge.net" <
> dmtcp-forum@lists.sourceforge.net>
> *Subject: *Re: [Dmtcp-forum] vfork usage
>
>
>
> EXTERNAL MAIL
>
> Hi Drew,
>
>
>
> It's possible to support vfork but would require some work on our side. In
> the meanwhile, there might a simpler way to support vfork for you. Can you
> provide a typical usage scenario that you have with vfork+exec? The most
> important thing for us is to find out what would go on between vfork and
> exec, both in the parent and the child process. I know that the caller
> thread is going to be blocked but what about the other threads and the
> child process? Would it be okay to say block the other threads while the
> vfork call is in progress?
>
>
>
> The reason why a generic vfork support is tricky is because DMTCP tries
> some fancy tricks with atfork wrappers to make the coordinator aware of
> to-be-created process so that the new process doesn't miss a checkpoint
> command that arrives in the middle of fork. There are alternatives but I'd
> want to know more about the application before trying any changes :).
>
>
>
> Best,
>
> Kapil
>
>
>
> On Thu, Jan 16, 2020 at 12:45 PM Andrew Lynch <d...@cadence.com> wrote:
>
> Hi Folks,
>
>   We are checkpointing very large processes that use more than half the
> available memory on our hosts.  We utilize vfork/exec to launch small
> processes.  Vfork is mapped to fork in execwrappers.cpp:
>
>
>
> extern "C" pid_t vfork()
>
> {
>
>   JTRACE("vfork wrapper calling fork");
>
>   // This might not preserve the full semantics of vfork.
>
>   // Used for checkpointing gdb.
>
>   return fork();
>
> }
>
>
>
> Fork checks for available memory to duplicate the process and fails if not
> enough memory exists (even though the required memory for the new process
> is very small).  Is there a way to use vfork?  Has anyone tried removing
> this mapping?
>
>
>
> Regards,
>
>    Drew
>
>
>
>
>
> [image: http://www.cadence.com/mail/footer_logocdns2.jpg]
>
> [image: Cadence Cares]
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__fortune.com_best-2Dcompanies_cadence-2D52_&d=DwMFaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=qrX-l6Bfg_bYZxUcrIJ33g&m=-7X4GWMEds7sq2e1cnO-Z1BmcA1SgnmLRculvovhiVg&s=xQdOQIWOa_CBfzldC_EWjvk9ArhPFVDo5ceFKnoxCmA&e=>
>
> *Andrew T. Lynch*     |    Software Architect
>
> T: 408.914.6875   M: 408.832.1045    www.cadence.com
>
>
>
> _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_dmtcp-2Dforum&d=DwMFaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=qrX-l6Bfg_bYZxUcrIJ33g&m=-7X4GWMEds7sq2e1cnO-Z1BmcA1SgnmLRculvovhiVg&s=q_HZpF-0zvpNnAs34tJDJzYlbAiMrrDKyUmjCDqrHIo&e=>
>
>
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to