?Hi,

I was using openmpi  over ethernet.


Ramy Gad
Johannes Gutenberg - Universität Mainz
Zentrums für Datenverarbeitung (ZDV)

Anselm-Franz-von-Bentzel-Weg 12
55128 Mainz
Germany
E-Mail: g...@uni-mainz.de<mailto:g...@uni-mainz.de>
Office Phone: +49-6131-39-26437

________________________________
From: jonny...@gmail.com <jonny...@gmail.com> on behalf of Jiajun Cao 
<jia...@ccs.neu.edu>
Sent: Monday, August 17, 2015 5:49 PM
To: Gad, Ramy
Cc: Nagel, Lars; dmtcp-forum@lists.sourceforge.net; Süß, Dr. Tim; Rohan Garg
Subject: Re: [Dmtcp-forum] DMTCP scaling potential

Also, could you specify what kind of network you were using for communication, 
i.e., Ethernet, InfiniBand, or something else?

Best,
Jiajun

On Mon, Aug 17, 2015 at 11:09 AM, Rohan Garg 
<rohg...@ccs.neu.edu<mailto:rohg...@ccs.neu.edu>> wrote:
Hi Ramy,

In the past we have tested with up to 2K cores. The results were
published in HPDC-2014 [1]. We are currently doing scalability
tests at Stampede [2], and have not noticed any issues up to
4K cores.

The inability to scale beyond 768 cores could be a bug in DMTCP,
or some configuration issue. My best guess (looking at the number 768)
would be that there is a limit on the number of open file descriptions per
process on the node where your coordinator is running.

Could you give us more details of your setup? In particular, it'll be helpful
to know the following details:

 - DMTCP version
 - MPI library
 - Resource manager
 - Linux kernel version
 - Process limits (Try: ulimit -a)

If it helps, we'd be happy to assist you in setting up your environment.

[1]: http://www.ccs.neu.edu/home/gene/papers/hpdc14.pdf
[2]: https://www.tacc.utexas.edu/stampede/

Thanks,
Rohan

> On Aug 17, 2015, at 4:48 AM, Gad, Ramy 
> <g...@uni-mainz.de<mailto:g...@uni-mainz.de>> wrote:
>
> Hi,
>
> We have used DMTCP to checkpoint several mpi applications for example 
> mpiblast, ray, phylobayes and namd.
> However we were able to scale no more than 768 cores.
>
> My questions are :
>
> Is there a limitation on the maximum scaling potential with DMTCP  ?
>
> Have anyone done any scaling test?  if  so is this result available for 
> public ?
>
> can we scale more than 1K cores with DMTCP ?
>
> Best regards,?
>
> Ramy Gad
> Johannes Gutenberg - Universität Mainz
> Zentrums für Datenverarbeitung (ZDV)
>
> Anselm-Franz-von-Bentzel-Weg 12
> 55128 Mainz
> Germany
> E-Mail: g...@uni-mainz.de<mailto:g...@uni-mainz.de>
> Office Phone: +49-6131-39-26437<tel:%2B49-6131-39-26437>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net<mailto:Dmtcp-forum@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum


------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net<mailto:Dmtcp-forum@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to