Hi all, thanks for the many tips and suggestions, I really appreciate you spending your time and effort helping me out! I set up valgrind and kcachegrind, which I found exceptionally easy and can get started now - perfect! In case anyone reads this in the future, I had to use "mpirun -n 4 valgrind --tool=callgrind ./my_prog" (almost what Wolfgang suggested).
And regarding the other issue with more cores being used during AMG setup and solve with both Trilinos and PETSc, it seems to me that Wolfgang was right: I set the environment variable according to the suggestion with "export OMP_NUM_THREADS=1" before program execution. Then, I see the expected behavior and no additional cores are recruited - buonissimo! Thanks again everyone for the help! Best regards Richard [email protected] schrieb am Mittwoch, 4. August 2021 um 09:33:40 UTC+2: > Dear Richard, > I recently attend summer school > in which I learn about paraver but that is the visual tool for performance > from Barcelona SuperComputing center. But it comes with one more pre- > processor and a post-processor. Check the link below. > > > https://www.bsc.es/discover-bsc/organisation/scientific-structure/performance-tools > > > > Regards, > Heena > > > On Tue, Aug 3, 2021 at 3:58 PM [email protected] < > [email protected]> wrote: > >> Dear all, >> I spent quite some time on our in-house CFD and FSI solvers, which are >> matrix-based and use deal.II, MPI and AMG packages of Trilinos and PETSc, >> all of which are so wonderfully accessible even for engineers like me. My >> computations now focused on problems with relatively small DoF count - say, >> max. 10 mio. - and the number of mpi ranks was eye-balled, staying below >> 20. At this stage, I would like to know >> >> a) which (free) profiling tools can you recommend? I watched the video >> lecture of Wolfgang about that topic, but was looking for more opinions! I >> want to see which parts of the code take time apart from the (already >> detailed) TimerOutput. >> >> b) If I use simply "mpirun -n 4 mycode" on a machine with 8 physical >> cores, why do both PETSc and Trilinos use 8 cores during the AMG setup and >> solve? I observed that using the htop command, even when using an >> off-the-shelf "step-40.release" as included in the library. Does anyone >> else see that? It looks something like this during the AMG setup and solve >> for "mpirun -n 8 step-40": >> [image: screenshot_trilinos_step40_mpirun_n_8.png] >> It might be linked to the installation on the server, where I used candi. >> On my local machine, however, this does not happen. >> >> Any hints are very much welcome, thanks for reading and any tips! >> >> Best regards & greetings from Graz >> Richard >> >> -- >> The deal.II project is located at http://www.dealii.org/ >> For mailing list/forum options, see >> https://groups.google.com/d/forum/dealii?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "deal.II User Group" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/dealii/01204e4d-1a90-4cfa-b5a2-a94b19d5cb6en%40googlegroups.com >> >> <https://groups.google.com/d/msgid/dealii/01204e4d-1a90-4cfa-b5a2-a94b19d5cb6en%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/55117515-a26a-4bfb-a072-33a284234709n%40googlegroups.com.
