Derek, Excellent analysis! It really helps. I further looked at the PETSc log summary when running on 1 CPU, the max memory Petsc allocated is about 22.2G (close to what you predicted). However, the total memory usage is up to 100G. This is much more than expected. I think there should be something wrong with my API that takes huge memory. I will double check it. Thanks again.
-Xujun On Thu, Jun 9, 2016 at 11:40 AM, Derek Gaston <[email protected]> wrote: > Back of the envelope: > > (Assuming you're using HEX27 elements... but the analysis won't be off by > much if you're using HEX20) > > ~60 first order nodes in each direction: 216,000 first order nodes > ~120 second order nodes in each direction: 1,728,000 second order nodes > > One first order variable: 216,000 degrees of freedom (DoFs) > Three second order variables: 5,184,000 DoFs > Total DoFs: 5,400,000 > > So... one solution/Krylov vector will be ~50MB (8 bytes * total DoFs). By > default (if you're using PETSc with gmres) you'll get at least 30 Krylov > vectors plus 3 or so in libMesh for storing current, old and older > solutions... let's call it "40" vectors (to account for some overhead and > temporary copies of things etc.)... so a total of 2GB of RAM just for > solution vectors. > > Harder to calculate (but we can still ballpark it) is the amount of RAM > the Jacobian matrix will take up. Let's go with worst case. Any interior > vertex degree of freedom will have 5^3=125 nonzeros per second order > variable (375 total) and 27 more for the linear variable so: ~400 nonzeros > per row. That means the Jacobian will take up about as much memory as 400 > solution vectors: 20GB. > > Depending on what preconditioner you choose you might even end up with a > _copy_ of that Jacobian. Using something like ILU with limited fill you > won't have much memory overhead... but using one of the HYPRE > preconditioners will net you a full copy. To hedge our bets here I'm going > to add in a 25% memory addition for the preconditioner: 5GB > > So... the Jacobian matrix, preconditioner and other solution vectors > together are about 27GB of RAM. > > Now... let's look at the Mesh: > > 1,728,000 total nodes * 3 doubles (to store the coordinates) * 8 bytes: > 42MB > Each element holds a pointer to its 27 nodes. 60^3 * 27 * 8 bytes: 47MB > > Total: ~100MB. There will be more memory than just this though... the > Mesh also stores quite a bit of information about degrees of freedom... > etc. Let's just multiply by 3 for a safe number: 300MB. The actual number > will be different... but it will be on this order of magnitude... > > Here's the final tally: > > Solution vectors, Matrix and preconditioner: ~27GB > Mesh : ~300MB > > As you can see... the Mesh is NOT your problem. You don't need to worry > about the memory the Mesh will use unless you have tens of millions (or > even hundreds of millions) of elements. 216,000 elements is not anywhere > close. > > If you want to run the problem size that you've proposed you're either > going to need a beefy workstation or, even better, a small cluster. I > generally recommend keeping about 20,000 DoFs per processor... so you could > scale this problem all the way out to ~270 processors. > > The memory from the solution vectors, matrix and preconditioner will more > or less "scale" (i.e. it will be distributed across all of the processors) > while the memory for the mesh will be fixed when using > SerialMesh/ReplicatedMesh (so you'll have 300MB for each MPI process that > won't reduce as you spread the problem out). > > Hope that helps... > > Derek > > On Wed, Jun 8, 2016 at 5:56 PM Cody Permann <[email protected]> wrote: > >> On Wed, Jun 8, 2016 at 3:41 PM Xujun Zhao <[email protected]> wrote: >> >> > Hi Cody, >> > >> > This sounds like the mesh data keeps a copy on each processor, but the >> > matrices and vectors are still stored distributedly. is it correct? >> > >> > Yes >> >> >> > I have a 3D stokes problem with 60x60x60 mesh, 2nd order element for >> > velocity u,v,w, and first order for pressure p. Totally about 2.9M dofs. >> > This can run with 1, 2 and 3 CPUs. However, if I use 4 CPUs, the program >> > crashed with segmentation fault as follows: >> > >> > If I run a smaller system, e.g. 25x25x25, it still works for 4 CPUs. Do >> > you think this is caused by memory due to the mesh duplication? >> > >> > That's a good size problem for a single machine. You may very well be >> running out of memory here. I'd suggest that you open up another window >> and >> watch the memory usage for your smaller problem, scale it up and watch it >> grow. You can always try switching to "DistributedMesh" to see if that >> helps. It will a little but it probably won't make as big of a difference >> as you might expect. It might be time to distribute your problem to a few >> nodes. >> >> Cody >> >> >> > >> > >> ==================================================================================== >> > >> > BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> > >> > = PID 23903 RUNNING AT b461 >> > >> > = EXIT CODE: 9 >> > >> > = CLEANING UP REMAINING PROCESSES >> > >> > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> > >> > >> > >> =================================================================================== >> > >> > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9) >> > >> > This typically refers to a problem with your application. >> > >> > Please see the FAQ page for debugging suggestions >> > >> > On Wed, Jun 8, 2016 at 4:17 PM, Cody Permann <[email protected]> >> > wrote: >> > >> >> That's right! >> >> >> >> This is the classic space versus time tradeoff. In the bigger scheme of >> >> things, using a little more memory is usually fine on a modern system. >> The >> >> SerialMesh (now called ReplicatedMesh) is quite a bit faster. I think >> the >> >> general consensus is: use ReplicatedMesh until you are truly memory >> >> constrained AND you know that the bulk of the memory is in your mesh >> and >> >> not your matrices and vectors and everything else. >> >> >> >> Cody >> >> >> >> On Wed, Jun 8, 2016 at 2:40 PM Xujun Zhao <[email protected]> wrote: >> >> >> >>> Hi all, >> >>> >> >>> I am curious about SerialMesh running with multiple CPUs. If I have 1 >> >>> node >> >>> with 16 cores on the cluster. Will "mpirun -n 16" lead to 16 copies of >> >>> SerialMesh? If so, it looks like running on multiple CPUs will require >> >>> more >> >>> memory?? >> >>> >> >>> Thanks. >> >>> Xujun >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and >> >>> traffic >> >>> patterns at an interface-level. Reveals which users, apps, and >> protocols >> >>> are >> >>> consuming the most bandwidth. Provides multi-vendor support for >> NetFlow, >> >>> J-Flow, sFlow and other flows. Make informed decisions using capacity >> >>> planning reports. >> >>> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e >> >>> _______________________________________________ >> >>> Libmesh-users mailing list >> >>> [email protected] >> >>> https://lists.sourceforge.net/lists/listinfo/libmesh-users >> >>> >> >> >> > >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and >> traffic >> patterns at an interface-level. Reveals which users, apps, and protocols >> are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity >> planning reports. >> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e >> _______________________________________________ >> Libmesh-users mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/libmesh-users >> > ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e _______________________________________________ Libmesh-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/libmesh-users
