Hi Ramy, I'm just checking back to see what the current status is. Were you able to overcome this 768 core limit? If not, how can we help you further? If your organization allows it, one other possibility is to give us a guest account. We would then directly test DMTCP with Open MPI, and we can then give you a diagnosis, and hopefully also a fix.
Best wishes, - Gene On Tue, Aug 18, 2015 at 09:26:41AM +0000, Gad, Ramy wrote: > Dear Rohan, > > Thank you for your replay. > > I will have a look at the publication. > > I will collect detail information about my setup and come to you as soon as > possible. > > Best regards, > > Ramy Gad > Johannes Gutenberg - Universität Mainz > Zentrums für Datenverarbeitung (ZDV) > > Anselm-Franz-von-Bentzel-Weg 12 > 55128 Mainz > Germany > E-Mail: g...@uni-mainz.de > Office Phone: +49-6131-39-26437 > > > ________________________________________ > From: Rohan Garg <rohg...@ccs.neu.edu> > Sent: Monday, August 17, 2015 5:09 PM > To: Gad, Ramy > Cc: dmtcp-forum@lists.sourceforge.net; Süß, Dr. Tim; Nagel, Lars > Subject: Re: [Dmtcp-forum] DMTCP scaling potential > > Hi Ramy, > > In the past we have tested with up to 2K cores. The results were > published in HPDC-2014 [1]. We are currently doing scalability > tests at Stampede [2], and have not noticed any issues up to > 4K cores. > > The inability to scale beyond 768 cores could be a bug in DMTCP, > or some configuration issue. My best guess (looking at the number 768) > would be that there is a limit on the number of open file descriptions per > process on the node where your coordinator is running. > > Could you give us more details of your setup? In particular, it’ll be helpful > to know the following details: > > - DMTCP version > - MPI library > - Resource manager > - Linux kernel version > - Process limits (Try: ulimit -a) > > If it helps, we’d be happy to assist you in setting up your environment. > > [1]: http://www.ccs.neu.edu/home/gene/papers/hpdc14.pdf > [2]: https://www.tacc.utexas.edu/stampede/ > > Thanks, > Rohan > > > On Aug 17, 2015, at 4:48 AM, Gad, Ramy <g...@uni-mainz.de> wrote: > > > > Hi, > > > > We have used DMTCP to checkpoint several mpi applications for example > > mpiblast, ray, phylobayes and namd. > > However we were able to scale no more than 768 cores. > > > > My questions are : > > > > Is there a limitation on the maximum scaling potential with DMTCP ? > > > > Have anyone done any scaling test? if so is this result available for > > public ? > > > > can we scale more than 1K cores with DMTCP ? > > > > Best regards, > > > > Ramy Gad > > Johannes Gutenberg - Universität Mainz > > Zentrums für Datenverarbeitung (ZDV) > > > > Anselm-Franz-von-Bentzel-Weg 12 > > 55128 Mainz > > Germany > > E-Mail: g...@uni-mainz.de > > Office Phone: +49-6131-39-26437 > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > Dmtcp-forum mailing list > > Dmtcp-forum@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum > > ------------------------------------------------------------------------------ > _______________________________________________ > Dmtcp-forum mailing list > Dmtcp-forum@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum ------------------------------------------------------------------------------ _______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum