Re: [Libmesh-users] duplicates of SerialMesh

Derek Gaston Thu, 09 Jun 2016 09:41:36 -0700

Back of the envelope:

(Assuming you're using HEX27 elements... but the analysis won't be off by
much if you're using HEX20)

~60 first order nodes in each direction: 216,000 first order nodes
~120 second order nodes in each direction: 1,728,000 second order nodes

One first order variable: 216,000 degrees of freedom (DoFs)
Three second order variables: 5,184,000 DoFs
Total DoFs: 5,400,000

So... one solution/Krylov vector will be ~50MB (8 bytes * total DoFs).  By
default (if you're using PETSc with gmres) you'll get at least 30 Krylov
vectors plus 3 or so in libMesh for storing current, old and older
solutions... let's call it "40" vectors (to account for some overhead and
temporary copies of things etc.)... so a total of 2GB of RAM just for
solution vectors.

Harder to calculate (but we can still ballpark it) is the amount of RAM the
Jacobian matrix will take up.  Let's go with worst case.  Any interior
vertex degree of freedom will have 5^3=125 nonzeros per second order
variable (375 total) and 27 more for the linear variable so: ~400 nonzeros
per row.  That means the Jacobian will take up about as much memory as 400
solution vectors: 20GB.

Depending on what preconditioner you choose you might even end up with a
_copy_ of that Jacobian.  Using something like ILU with limited fill you
won't have much memory overhead... but using one of the HYPRE
preconditioners will net you a full copy.  To hedge our bets here I'm going
to add in a 25% memory addition for the preconditioner: 5GB

So... the Jacobian matrix, preconditioner and other solution vectors
together are about 27GB of RAM.

Now... let's look at the Mesh:

1,728,000 total nodes * 3 doubles (to store the coordinates) * 8 bytes: 42MB
Each element holds a pointer to its 27 nodes.  60^3 * 27 * 8 bytes: 47MB

Total: ~100MB.  There will be more memory than just this though... the Mesh
also stores quite a bit of information about degrees of freedom... etc.
Let's just multiply by 3 for a safe number: 300MB.  The actual number will
be different... but it will be on this order of magnitude...

Here's the final tally:

Solution vectors, Matrix and preconditioner: ~27GB
Mesh : ~300MB

As you can see... the Mesh is NOT your problem.  You don't need to worry
about the memory the Mesh will use unless you have tens of millions (or
even hundreds of millions) of elements.  216,000 elements is not anywhere
close.

If you want to run the problem size that you've proposed you're either
going to need a beefy workstation or, even better, a small cluster.  I
generally recommend keeping about 20,000 DoFs per processor... so you could
scale this problem all the way out to ~270 processors.

The memory from the solution vectors, matrix and preconditioner will more
or less "scale" (i.e. it will be distributed across all of the processors)
while the memory for the mesh will be fixed when using
SerialMesh/ReplicatedMesh (so you'll have 300MB for each MPI process that
won't reduce as you spread the problem out).

Hope that helps...

Derek

On Wed, Jun 8, 2016 at 5:56 PM Cody Permann <[email protected]> wrote:

> On Wed, Jun 8, 2016 at 3:41 PM Xujun Zhao <[email protected]> wrote:
>
> > Hi Cody,
> >
> > This sounds like the mesh data keeps a copy on each processor, but the
> > matrices and vectors are still stored distributedly. is it correct?
> >
> > Yes
>
>
> > I have a 3D stokes problem with 60x60x60 mesh, 2nd order element for
> > velocity u,v,w, and first order for pressure p. Totally about 2.9M dofs.
> > This can run with 1, 2 and 3 CPUs. However, if I use 4 CPUs, the program
> > crashed with segmentation fault as follows:
> >
> > If I run a smaller system, e.g. 25x25x25, it still works for 4 CPUs. Do
> > you think this is caused by memory due to the mesh duplication?
> >
> > That's a good size problem for a single machine. You may very well be
> running out of memory here. I'd suggest that you open up another window and
> watch the memory usage for your smaller problem, scale it up and watch it
> grow. You can always try switching to "DistributedMesh" to see if that
> helps. It will a little but it probably won't make as big of a difference
> as you might expect. It might be time to distribute your problem to a few
> nodes.
>
> Cody
>
>
> >
> >
> ====================================================================================
> >
> > BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >
> > =   PID 23903 RUNNING AT b461
> >
> > =   EXIT CODE: 9
> >
> > =   CLEANING UP REMAINING PROCESSES
> >
> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >
> >
> >
> ===================================================================================
> >
> > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
> >
> > This typically refers to a problem with your application.
> >
> > Please see the FAQ page for debugging suggestions
> >
> > On Wed, Jun 8, 2016 at 4:17 PM, Cody Permann <[email protected]>
> > wrote:
> >
> >> That's right!
> >>
> >> This is the classic space versus time tradeoff. In the bigger scheme of
> >> things, using a little more memory is usually fine on a modern system.
> The
> >> SerialMesh (now called ReplicatedMesh) is quite a bit faster. I think
> the
> >> general consensus is: use ReplicatedMesh until you are truly memory
> >> constrained AND you know that the bulk of the memory is in your mesh and
> >> not your matrices and vectors and everything else.
> >>
> >> Cody
> >>
> >> On Wed, Jun 8, 2016 at 2:40 PM Xujun Zhao <[email protected]> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I am curious about SerialMesh running with multiple CPUs. If I have 1
> >>> node
> >>> with 16 cores on the cluster. Will "mpirun -n 16" lead to 16 copies of
> >>> SerialMesh? If so, it looks like running on multiple CPUs will require
> >>> more
> >>> memory??
> >>>
> >>> Thanks.
> >>> Xujun
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> >>> traffic
> >>> patterns at an interface-level. Reveals which users, apps, and
> protocols
> >>> are
> >>> consuming the most bandwidth. Provides multi-vendor support for
> NetFlow,
> >>> J-Flow, sFlow and other flows. Make informed decisions using capacity
> >>> planning reports.
> >>> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> >>> _______________________________________________
> >>> Libmesh-users mailing list
> >>> [email protected]
> >>> https://lists.sourceforge.net/lists/listinfo/libmesh-users
> >>>
> >>
> >
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> _______________________________________________
> Libmesh-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/libmesh-users
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Re: [Libmesh-users] duplicates of SerialMesh

Reply via email to