?Hi,
I was using openmpi over ethernet. Ramy Gad Johannes Gutenberg - Universität Mainz Zentrums für Datenverarbeitung (ZDV) Anselm-Franz-von-Bentzel-Weg 12 55128 Mainz Germany E-Mail: g...@uni-mainz.de<mailto:g...@uni-mainz.de> Office Phone: +49-6131-39-26437 ________________________________ From: jonny...@gmail.com <jonny...@gmail.com> on behalf of Jiajun Cao <jia...@ccs.neu.edu> Sent: Monday, August 17, 2015 5:49 PM To: Gad, Ramy Cc: Nagel, Lars; dmtcp-forum@lists.sourceforge.net; Süß, Dr. Tim; Rohan Garg Subject: Re: [Dmtcp-forum] DMTCP scaling potential Also, could you specify what kind of network you were using for communication, i.e., Ethernet, InfiniBand, or something else? Best, Jiajun On Mon, Aug 17, 2015 at 11:09 AM, Rohan Garg <rohg...@ccs.neu.edu<mailto:rohg...@ccs.neu.edu>> wrote: Hi Ramy, In the past we have tested with up to 2K cores. The results were published in HPDC-2014 [1]. We are currently doing scalability tests at Stampede [2], and have not noticed any issues up to 4K cores. The inability to scale beyond 768 cores could be a bug in DMTCP, or some configuration issue. My best guess (looking at the number 768) would be that there is a limit on the number of open file descriptions per process on the node where your coordinator is running. Could you give us more details of your setup? In particular, it'll be helpful to know the following details: - DMTCP version - MPI library - Resource manager - Linux kernel version - Process limits (Try: ulimit -a) If it helps, we'd be happy to assist you in setting up your environment. [1]: http://www.ccs.neu.edu/home/gene/papers/hpdc14.pdf [2]: https://www.tacc.utexas.edu/stampede/ Thanks, Rohan > On Aug 17, 2015, at 4:48 AM, Gad, Ramy > <g...@uni-mainz.de<mailto:g...@uni-mainz.de>> wrote: > > Hi, > > We have used DMTCP to checkpoint several mpi applications for example > mpiblast, ray, phylobayes and namd. > However we were able to scale no more than 768 cores. > > My questions are : > > Is there a limitation on the maximum scaling potential with DMTCP ? > > Have anyone done any scaling test? if so is this result available for > public ? > > can we scale more than 1K cores with DMTCP ? > > Best regards,? > > Ramy Gad > Johannes Gutenberg - Universität Mainz > Zentrums für Datenverarbeitung (ZDV) > > Anselm-Franz-von-Bentzel-Weg 12 > 55128 Mainz > Germany > E-Mail: g...@uni-mainz.de<mailto:g...@uni-mainz.de> > Office Phone: +49-6131-39-26437<tel:%2B49-6131-39-26437> > > ------------------------------------------------------------------------------ > _______________________________________________ > Dmtcp-forum mailing list > Dmtcp-forum@lists.sourceforge.net<mailto:Dmtcp-forum@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum ------------------------------------------------------------------------------ _______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net<mailto:Dmtcp-forum@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum