Re: [Libmesh-users] small doc and efficiency update for ex 3 and 4

edgar Thu, 10 Jun 2021 15:56:17 -0700

On 2021-06-10 19:27, John Peterson wrote:

I recorded the "Active time" for the "Matrix Assembly Performance"PerfLog
in introduction_ex4 running "./example-opt -d 3 -n 40" for both the
original codepath and your proposed change, averaging the results over5
runs. The results were:
Original code, "./example-opt -d 3 -n 40"
import numpy as np
np.mean([3.91801, 3.93206, 3.94358, 3.97729, 3.90512]) = 3.93

Patch, "./example-opt -d 3 -n 40"
import numpy as np
np.mean([4.10462, 4.06232, 3.95176, 3.92786, 3.97992]) = 4.00
so I'd say the original code path is marginally (but stillstatisticallysignificantly) faster, although keep in mind that matrix assembly isonlyabout 21% of the total time for this example while the solve is about71%.

Superinteresting, I am sending you my benchmarks. I must say that I hadinitially run only 2 benchmarks, and both came out faster with themodifications. Now, I found that

- The original code is more efficient with `-n 40'
- The modified code is more efficient with `-n 15' and `mpirun -np 4'

- That I ran the 5-test trial several times and some times, the originalcode was more efficient with `-n 15', but the first and second run withthe modified code were always faster (my computer heating up?)

The gains are really marginal in any case. It would be interesting torun with -O3... (I just did [1]).It seems that the differences are now a little bit more substantial, andthat the modified code would be faster. I hope not to have made anymistakes.


The code and the benchmarks are in the attached file.
- examples
|- introduction
 |- ex4                    (original code)
  |- output_*_.txt.bz2     (running -n 40 with -O2)
  |- output_15_*_.txt.bz2     (running -n 15 with -O2)
  |- output_40_O3_*_.txt.bz2     (running -n 40 with -O3)
 |- ex4_mod                (modified code)
  |- output_*_.txt.bz2     (running -n 40 with -O2)
  |- output_15_*_.txt.bz2     (running -n 15 with -O2)
  |- output_40_O3_*_.txt.bz2     (running -n 40 with -O3)

[1] I manually compiled like this (added -O3 instead of -O2; disregardthe CCFLAGS et al):

$ mpicxx -std=gnu++17 -DNDEBUG -march=amdfam10 -O3-felide-constructors -funroll-loops -fstrict-aliasing-Wdisabled-optimization -fopenmp -I/usr/include -I/usr/include/curl -I-I/usr/include -I/usr/include/eigen3 -I/usr/include/vtk-I/usr/local/petsc/linux-c-opt/include-I/usr/local/petsc/linux-c-opt//include -I/usr/include/superlu-I/usr/local/include -I/usr/include/scotch -I/usr/include/tirpc -cexact_solution.C -o exact_solution.x86_64-pc-linux-gnu.opt.o

$ mpicxx -std=gnu++17 -DNDEBUG -march=amdfam10 -O3-felide-constructors -funroll-loops -fstrict-aliasing-Wdisabled-optimization -fopenmp -I/usr/include -I/usr/include/curl -I-I/usr/include -I/usr/include/eigen3 -I/usr/include/vtk-I/usr/local/petsc/linux-c-opt/include-I/usr/local/petsc/linux-c-opt//include -I/usr/include/superlu-I/usr/local/include -I/usr/include/scotch -I/usr/include/tirpc -cintroduction_ex4.C -o introduction_ex4.x86_64-pc-linux-gnu.opt.o

$ mpicxx -std=gnu++17 -march=amdfam10 -O3 -felide-constructors-funroll-loops -fstrict-aliasing -Wdisabled-optimization -fopenmpexact_solution.x86_64-pc-linux-gnu.opt.ointroduction_ex4.x86_64-pc-linux-gnu.opt.o -o example-opt -Wl,-rpath-Wl,/usr/lib -Wl,-rpath -Wl,/lib -Wl,-rpath -Wl,/usr/lib -Wl,-rpath-Wl,/usr/local/petsc/linux-c-opt/lib -Wl,-rpath -Wl,/usr/local/lib-Wl,-rpath -Wl,/usr/include/scotch -Wl,-rpath -Wl,/usr/lib/openmpi-Wl,-rpath -Wl,/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/usr/lib/libHYPRE.so -L/usr/lib -lmesh_opt -ltimpi_opt -L/lib-L/usr/local/petsc/linux-c-opt/lib -L/usr/local/lib-L/usr/include/scotch -L/usr/lib/openmpi-L/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0 -lhdf5_cpp -lcurl -lnlopt-lglpk -lvtkIOCore -lvtkCommonCore -lvtkCommonDataModel -lvtkFiltersCore-lvtkIOXML -lvtkImagingCore -lvtkIOImage -lvtkImagingMath-lvtkIOParallelXML -lvtkParallelMPI -lvtkParallelCore-lvtkCommonExecutionModel -lpetsc -lcmumps -ldmumps -lsmumps -lzmumps-lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf-lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu-lfftw3_mpi -lfftw3 -llapack -lblas -lopenblas -lesmumps -lptscotch-lptscotcherr -lscotch -lscotcherr -lbz2 -lcgns -lmedC -lmed-lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lmetis -lz -lOpenCL-lyaml -lhwloc -lX11 -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh-lmpi -lgfortran -lm -lgcc_s -lpthread -lquadmath -lstdc++ -ldl -ltirpc-fopenmp


_______________________________________________
Libmesh-users mailing list
Libmesh-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Re: [Libmesh-users] small doc and efficiency update for ex 3 and 4

Reply via email to