Re: [Libmesh-users] duplicates of SerialMesh

Xujun Zhao Thu, 09 Jun 2016 14:31:12 -0700

Derek,

Excellent analysis! It really helps.
I further looked at the PETSc log summary when running on 1 CPU, the max
memory Petsc allocated is about 22.2G (close to what you predicted).
However, the total memory usage is up to 100G. This is much more than
expected. I think there should be something wrong with my API that takes
huge memory. I will double check it. Thanks again.


-Xujun

On Thu, Jun 9, 2016 at 11:40 AM, Derek Gaston <[email protected]> wrote:

> Back of the envelope:
>
> (Assuming you're using HEX27 elements... but the analysis won't be off by
> much if you're using HEX20)
>
> ~60 first order nodes in each direction: 216,000 first order nodes
> ~120 second order nodes in each direction: 1,728,000 second order nodes
>
> One first order variable: 216,000 degrees of freedom (DoFs)
> Three second order variables: 5,184,000 DoFs
> Total DoFs: 5,400,000
>
> So... one solution/Krylov vector will be ~50MB (8 bytes * total DoFs).  By
> default (if you're using PETSc with gmres) you'll get at least 30 Krylov
> vectors plus 3 or so in libMesh for storing current, old and older
> solutions... let's call it "40" vectors (to account for some overhead and
> temporary copies of things etc.)... so a total of 2GB of RAM just for
> solution vectors.
>
> Harder to calculate (but we can still ballpark it) is the amount of RAM
> the Jacobian matrix will take up.  Let's go with worst case.  Any interior
> vertex degree of freedom will have 5^3=125 nonzeros per second order
> variable (375 total) and 27 more for the linear variable so: ~400 nonzeros
> per row.  That means the Jacobian will take up about as much memory as 400
> solution vectors: 20GB.
>
> Depending on what preconditioner you choose you might even end up with a
> _copy_ of that Jacobian.  Using something like ILU with limited fill you
> won't have much memory overhead... but using one of the HYPRE
> preconditioners will net you a full copy.  To hedge our bets here I'm going
> to add in a 25% memory addition for the preconditioner: 5GB
>
> So... the Jacobian matrix, preconditioner and other solution vectors
> together are about 27GB of RAM.
>
> Now... let's look at the Mesh:
>
> 1,728,000 total nodes * 3 doubles (to store the coordinates) * 8 bytes:
> 42MB
> Each element holds a pointer to its 27 nodes.  60^3 * 27 * 8 bytes: 47MB
>
> Total: ~100MB.  There will be more memory than just this though... the
> Mesh also stores quite a bit of information about degrees of freedom...
> etc.  Let's just multiply by 3 for a safe number: 300MB.  The actual number
> will be different... but it will be on this order of magnitude...
>
> Here's the final tally:
>
> Solution vectors, Matrix and preconditioner: ~27GB
> Mesh : ~300MB
>
> As you can see... the Mesh is NOT your problem.  You don't need to worry
> about the memory the Mesh will use unless you have tens of millions (or
> even hundreds of millions) of elements.  216,000 elements is not anywhere
> close.
>
> If you want to run the problem size that you've proposed you're either
> going to need a beefy workstation or, even better, a small cluster.  I
> generally recommend keeping about 20,000 DoFs per processor... so you could
> scale this problem all the way out to ~270 processors.
>
> The memory from the solution vectors, matrix and preconditioner will more
> or less "scale" (i.e. it will be distributed across all of the processors)
> while the memory for the mesh will be fixed when using
> SerialMesh/ReplicatedMesh (so you'll have 300MB for each MPI process that
> won't reduce as you spread the problem out).
>
> Hope that helps...
>
> Derek
>
> On Wed, Jun 8, 2016 at 5:56 PM Cody Permann <[email protected]> wrote:
>
>> On Wed, Jun 8, 2016 at 3:41 PM Xujun Zhao <[email protected]> wrote:
>>
>> > Hi Cody,
>> >
>> > This sounds like the mesh data keeps a copy on each processor, but the
>> > matrices and vectors are still stored distributedly. is it correct?
>> >
>> > Yes
>>
>>
>> > I have a 3D stokes problem with 60x60x60 mesh, 2nd order element for
>> > velocity u,v,w, and first order for pressure p. Totally about 2.9M dofs.
>> > This can run with 1, 2 and 3 CPUs. However, if I use 4 CPUs, the program
>> > crashed with segmentation fault as follows:
>> >
>> > If I run a smaller system, e.g. 25x25x25, it still works for 4 CPUs. Do
>> > you think this is caused by memory due to the mesh duplication?
>> >
>> > That's a good size problem for a single machine. You may very well be
>> running out of memory here. I'd suggest that you open up another window
>> and
>> watch the memory usage for your smaller problem, scale it up and watch it
>> grow. You can always try switching to "DistributedMesh" to see if that
>> helps. It will a little but it probably won't make as big of a difference
>> as you might expect. It might be time to distribute your problem to a few
>> nodes.
>>
>> Cody
>>
>>
>> >
>> >
>> ====================================================================================
>> >
>> > BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> >
>> > =   PID 23903 RUNNING AT b461
>> >
>> > =   EXIT CODE: 9
>> >
>> > =   CLEANING UP REMAINING PROCESSES
>> >
>> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >
>> >
>> >
>> ===================================================================================
>> >
>> > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
>> >
>> > This typically refers to a problem with your application.
>> >
>> > Please see the FAQ page for debugging suggestions
>> >
>> > On Wed, Jun 8, 2016 at 4:17 PM, Cody Permann <[email protected]>
>> > wrote:
>> >
>> >> That's right!
>> >>
>> >> This is the classic space versus time tradeoff. In the bigger scheme of
>> >> things, using a little more memory is usually fine on a modern system.
>> The
>> >> SerialMesh (now called ReplicatedMesh) is quite a bit faster. I think
>> the
>> >> general consensus is: use ReplicatedMesh until you are truly memory
>> >> constrained AND you know that the bulk of the memory is in your mesh
>> and
>> >> not your matrices and vectors and everything else.
>> >>
>> >> Cody
>> >>
>> >> On Wed, Jun 8, 2016 at 2:40 PM Xujun Zhao <[email protected]> wrote:
>> >>
>> >>> Hi all,
>> >>>
>> >>> I am curious about SerialMesh running with multiple CPUs. If I have 1
>> >>> node
>> >>> with 16 cores on the cluster. Will "mpirun -n 16" lead to 16 copies of
>> >>> SerialMesh? If so, it looks like running on multiple CPUs will require
>> >>> more
>> >>> memory??
>> >>>
>> >>> Thanks.
>> >>> Xujun
>> >>>
>> >>>
>> ------------------------------------------------------------------------------
>> >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> >>> traffic
>> >>> patterns at an interface-level. Reveals which users, apps, and
>> protocols
>> >>> are
>> >>> consuming the most bandwidth. Provides multi-vendor support for
>> NetFlow,
>> >>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> >>> planning reports.
>> >>> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
>> >>> _______________________________________________
>> >>> Libmesh-users mailing list
>> >>> [email protected]
>> >>> https://lists.sourceforge.net/lists/listinfo/libmesh-users
>> >>>
>> >>
>> >
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning reports.
>> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
>> _______________________________________________
>> Libmesh-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/libmesh-users
>>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Re: [Libmesh-users] duplicates of SerialMesh

Reply via email to