Hi Andrew, I have made some progress on vfork support but it'll take a bit more time. I'll try to get a prelim version over the weekend.
Regarding VDSO errors, can you tell us a bit more about the DMTCP version you are using? Best, Kapil On Thu, Jan 30, 2020 at 10:45 AM Andrew Lynch <d...@cadence.com> wrote: > Hi Kapil, > > Any further thoughts on our vfork issue? > > > > We also have another issue. So far we are only seeing it on varieties of > CentOS 6.10, but we seem to be having VDSO related issues when we save on > one version (i.e. 2.6.32-754.14.2.el6.x86_64) and restart on another > (i.e. 2.6.32-754.3.5.el6.x86_64) In one case it manifests as a crash in > gettimeofday when it tries to call __vdso_gettimeofday. > > > > Regards, > > Drew > > > > > > [image: http://www.cadence.com/mail/footer_logocdns2.jpg] > > [image: Cadence Cares] <http://fortune.com/best-companies/cadence-52/> > > *Andrew T. Lynch* | Software Architect > > T: 408.914.6875 M: 408.832.1045 www.cadence.com > > > > > > *From: *Andrew Lynch <d...@cadence.com> > *Date: *Friday, January 17, 2020 at 11:26 AM > *To: *Kapil Arya <kapil.arya...@gmail.com> > *Cc: *"dmtcp-forum@lists.sourceforge.net" < > dmtcp-forum@lists.sourceforge.net>, Rodion Melnikov <rodi...@cadence.com> > *Subject: *Re: [Dmtcp-forum] vfork usage > > > > Hi Kapil, > > Thanks for your prompt response. We have a significant number of vfork > calls in our system, but not all of them occur when the system is likely to > be using more than half of all available memory. Additionally, our system > is mostly single threaded and checkpoints itself, so I don’t think we have > the risk of a checkpoint arriving “in the middle of a fork”. As such, I > think it would be ok to block the other threads while the vfork call is in > progress. This is at least true in the one situation we are currently > addressing. > > > > I’ve appended a simple program the demonstrates our usage. > > > > I look forward to your response. > > > > Regards, > > Drew > > > > int sn_simple_popen_r_cfn(void *pipes_and_command) { > > void ** pnc = pipes_and_command; > > int pipe_descs[2]; > > pipe_descs[0] = (int)(intptr_t)pnc[0]; > > pipe_descs[1] = (int)(intptr_t)pnc[1]; > > char *command = pnc[2]; > > > > dup2(pipe_descs[1], STDOUT_FILENO); > > > > close(pipe_descs[1]); > > close(pipe_descs[0]); > > execl("/bin/sh", "sh", "-c", command, (char *) NULL); > > _exit(-1); // Should never get here > > } > > > > static int sn_popen_pid; > > > > FILE *sn_simple_popen_r(char *command) { > > FILE *result; > > int pipe_descs[2]; > > void *pipes_and_command[3]; > > > > const int STACK_SIZE = 1024; > > char* stack[STACK_SIZE]; > > > > if (pipe(pipe_descs) < 0) > > return NULL; > > > > pipes_and_command[0] = (void*)(intptr_t)pipe_descs[0]; > > pipes_and_command[1] = (void*)(intptr_t)pipe_descs[1]; > > pipes_and_command[2] = command; > > > > if (-1 == (sn_popen_pid = vfork())) { > > close(pipe_descs[0]); > > close(pipe_descs[1]); > > return NULL; > > } > > else if (sn_popen_pid == 0) { > > sn_simple_popen_r_cfn(pipes_and_command); > > } > > else { > > result = fdopen(pipe_descs[0], "r"); > > close(pipe_descs[1]); > > } > > return result; > > } > > > > void sn_simple_pclose_r(FILE *p) { > > > > int stat = waitpid(sn_popen_pid, NULL, WNOHANG); > > > > if (stat == 0) { > > kill(sn_popen_pid, SIGKILL); > > while (-1 == waitpid(sn_popen_pid, NULL, 0)) { > > if (errno != EINTR) > > break; > > } > > } > > sn_popen_pid = 0; > > fclose(p); > > } > > > > > > > > [image: http://www.cadence.com/mail/footer_logocdns2.jpg] > > [image: Cadence Cares] <http://fortune.com/best-companies/cadence-52/> > > *Andrew T. Lynch* | Software Architect > > T: 408.914.6875 M: 408.832.1045 www.cadence.com > > > > > > *From: *Kapil Arya <kapil.arya...@gmail.com> > *Date: *Thursday, January 16, 2020 at 1:23 PM > *To: *Andrew Lynch <d...@cadence.com> > *Cc: *"dmtcp-forum@lists.sourceforge.net" < > dmtcp-forum@lists.sourceforge.net> > *Subject: *Re: [Dmtcp-forum] vfork usage > > > > EXTERNAL MAIL > > Hi Drew, > > > > It's possible to support vfork but would require some work on our side. In > the meanwhile, there might a simpler way to support vfork for you. Can you > provide a typical usage scenario that you have with vfork+exec? The most > important thing for us is to find out what would go on between vfork and > exec, both in the parent and the child process. I know that the caller > thread is going to be blocked but what about the other threads and the > child process? Would it be okay to say block the other threads while the > vfork call is in progress? > > > > The reason why a generic vfork support is tricky is because DMTCP tries > some fancy tricks with atfork wrappers to make the coordinator aware of > to-be-created process so that the new process doesn't miss a checkpoint > command that arrives in the middle of fork. There are alternatives but I'd > want to know more about the application before trying any changes :). > > > > Best, > > Kapil > > > > On Thu, Jan 16, 2020 at 12:45 PM Andrew Lynch <d...@cadence.com> wrote: > > Hi Folks, > > We are checkpointing very large processes that use more than half the > available memory on our hosts. We utilize vfork/exec to launch small > processes. Vfork is mapped to fork in execwrappers.cpp: > > > > extern "C" pid_t vfork() > > { > > JTRACE("vfork wrapper calling fork"); > > // This might not preserve the full semantics of vfork. > > // Used for checkpointing gdb. > > return fork(); > > } > > > > Fork checks for available memory to duplicate the process and fails if not > enough memory exists (even though the required memory for the new process > is very small). Is there a way to use vfork? Has anyone tried removing > this mapping? > > > > Regards, > > Drew > > > > > > [image: http://www.cadence.com/mail/footer_logocdns2.jpg] > > [image: Cadence Cares] > <https://urldefense.proofpoint.com/v2/url?u=http-3A__fortune.com_best-2Dcompanies_cadence-2D52_&d=DwMFaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=qrX-l6Bfg_bYZxUcrIJ33g&m=-7X4GWMEds7sq2e1cnO-Z1BmcA1SgnmLRculvovhiVg&s=xQdOQIWOa_CBfzldC_EWjvk9ArhPFVDo5ceFKnoxCmA&e=> > > *Andrew T. Lynch* | Software Architect > > T: 408.914.6875 M: 408.832.1045 www.cadence.com > > > > _______________________________________________ > Dmtcp-forum mailing list > Dmtcp-forum@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum > <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_dmtcp-2Dforum&d=DwMFaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=qrX-l6Bfg_bYZxUcrIJ33g&m=-7X4GWMEds7sq2e1cnO-Z1BmcA1SgnmLRculvovhiVg&s=q_HZpF-0zvpNnAs34tJDJzYlbAiMrrDKyUmjCDqrHIo&e=> > >
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum