Hey guys,

Using Roy's workaround so that partitioning doesn't happen with  
ParallelMesh I've been able to run some pretty big problems today, and  
I thought I would share some numbers  All I'm doing is solving pure  
diffusion with a Dirichlet BC and a forcing function in 3d on  
hexes.... but I'm doing it completely matrix free using the  
NonlinearSystem class.

The point of these runs is to do some parallel scaling tests.  From  
previous runs with SerialMesh I knew that I needed more that 2 million  
dofs to see any kind of good parallel scaling over 128 procs.  Wanting  
to get good scaling up to 1024 procs... I did some quick calculations  
(using some small problems on my 8GB of RAM desktop) I figured I could  
fit 80 million DOFs on 128 procs (each proc has 2GB of RAM) using  
ParallelMesh.... it turns out I was pretty far off!  In fact, I was  
off so far that I ended up bringing down half of our supercomputer as  
it started swapping like crazy!

After rebooting a few nodes, I'm now running a slightly smaller  
problem at 10 million DOFs.  I have it running on 64,128 and 256  
processors (the 512, and 1024 jobs are still in the queue).  This  
gives me an interesting opportunity to look at the memory scaling  
using ParallelMesh, since I don't have a matrix involved.

Here is how much each proc is using:

#CPU : #MB/proc
256 : 200-700
128 : 350-700
64 : 450-800

First thing to note is that the #MB/proc is a _range_.  This range is  
taken from me watching "top" on one of the compute nodes.  The memory  
usage of each process _oscillates_ between the two numbers listed  
about every 5 seconds.  Based on watching using "xosview" I believe  
that the high numbers occur during communication steps, while the low  
numbers are during "assembly" (residual computation).  At this point,  
these are just guesses, but I've been watching these things for a few  
weeks now and kind of have a feel for what's going on.

Second thing to realize is that the upper number (700,700,800) is  
about the same.  This is somewhat of a bummer since it means that just  
adding more procs isn't going to allow me to run an appreciably larger  
problem.  I'm guessing that we have some serialized vectors that at  
this point are contributing much more memory than the mesh is...

Anyway, I just thought I would share some data with everyone.  This is  
by no means a request for anything to happen... nor is it a bug report  
or anything of the sort.  The fact of the matter is that I will be  
able to run something like 30 million DOFs on this computer without  
problem.... and and that is freaking sweet!  That will more than  
satisfy my goals this year.  But... I am going to keep an eye out for  
where some fat might be trimmed along the way... and maybe we can get  
the memory usage scaling a bit better...

Derek

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Libmesh-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to