On 2021-06-10 19:27, John Peterson wrote:
I recorded the "Active time" for the "Matrix Assembly Performance" PerfLog
in introduction_ex4 running "./example-opt -d 3 -n 40" for both the
original codepath and your proposed change, averaging the results over 5
runs. The results were:

Original code, "./example-opt -d 3 -n 40"
import numpy as np
np.mean([3.91801, 3.93206, 3.94358, 3.97729, 3.90512]) = 3.93

Patch, "./example-opt -d 3 -n 40"
import numpy as np
np.mean([4.10462, 4.06232, 3.95176, 3.92786, 3.97992]) = 4.00

so I'd say the original code path is marginally (but still statistically significantly) faster, although keep in mind that matrix assembly is only about 21% of the total time for this example while the solve is about 71%.

Superinteresting, I am sending you my benchmarks. I must say that I had initially run only 2 benchmarks, and both came out faster with the modifications. Now, I found that
- The original code is more efficient with `-n 40'
- The modified code is more efficient with `-n 15' and `mpirun -np 4'
- That I ran the 5-test trial several times and some times, the original code was more efficient with `-n 15', but the first and second run with the modified code were always faster (my computer heating up?)

The gains are really marginal in any case. It would be interesting to run with -O3... (I just did [1]). It seems that the differences are now a little bit more substantial, and that the modified code would be faster. I hope not to have made any mistakes.

The code and the benchmarks are in the attached file.
- examples
|- introduction
 |- ex4                    (original code)
  |- output_*_.txt.bz2     (running -n 40 with -O2)
  |- output_15_*_.txt.bz2     (running -n 15 with -O2)
  |- output_40_O3_*_.txt.bz2     (running -n 40 with -O3)
 |- ex4_mod                (modified code)
  |- output_*_.txt.bz2     (running -n 40 with -O2)
  |- output_15_*_.txt.bz2     (running -n 15 with -O2)
  |- output_40_O3_*_.txt.bz2     (running -n 40 with -O3)


[1] I manually compiled like this (added -O3 instead of -O2; disregard the CCFLAGS et al):

$ mpicxx -std=gnu++17 -DNDEBUG -march=amdfam10 -O3 -felide-constructors -funroll-loops -fstrict-aliasing -Wdisabled-optimization -fopenmp -I/usr/include -I/usr/include/curl -I -I/usr/include -I/usr/include/eigen3 -I/usr/include/vtk -I/usr/local/petsc/linux-c-opt/include -I/usr/local/petsc/linux-c-opt//include -I/usr/include/superlu -I/usr/local/include -I/usr/include/scotch -I/usr/include/tirpc -c exact_solution.C -o exact_solution.x86_64-pc-linux-gnu.opt.o

$ mpicxx -std=gnu++17 -DNDEBUG -march=amdfam10 -O3 -felide-constructors -funroll-loops -fstrict-aliasing -Wdisabled-optimization -fopenmp -I/usr/include -I/usr/include/curl -I -I/usr/include -I/usr/include/eigen3 -I/usr/include/vtk -I/usr/local/petsc/linux-c-opt/include -I/usr/local/petsc/linux-c-opt//include -I/usr/include/superlu -I/usr/local/include -I/usr/include/scotch -I/usr/include/tirpc -c introduction_ex4.C -o introduction_ex4.x86_64-pc-linux-gnu.opt.o

$ mpicxx -std=gnu++17 -march=amdfam10 -O3 -felide-constructors -funroll-loops -fstrict-aliasing -Wdisabled-optimization -fopenmp exact_solution.x86_64-pc-linux-gnu.opt.o introduction_ex4.x86_64-pc-linux-gnu.opt.o -o example-opt -Wl,-rpath -Wl,/usr/lib -Wl,-rpath -Wl,/lib -Wl,-rpath -Wl,/usr/lib -Wl,-rpath -Wl,/usr/local/petsc/linux-c-opt/lib -Wl,-rpath -Wl,/usr/local/lib -Wl,-rpath -Wl,/usr/include/scotch -Wl,-rpath -Wl,/usr/lib/openmpi -Wl,-rpath -Wl,/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0 /usr/lib/libHYPRE.so -L/usr/lib -lmesh_opt -ltimpi_opt -L/lib -L/usr/local/petsc/linux-c-opt/lib -L/usr/local/lib -L/usr/include/scotch -L/usr/lib/openmpi -L/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0 -lhdf5_cpp -lcurl -lnlopt -lglpk -lvtkIOCore -lvtkCommonCore -lvtkCommonDataModel -lvtkFiltersCore -lvtkIOXML -lvtkImagingCore -lvtkIOImage -lvtkImagingMath -lvtkIOParallelXML -lvtkParallelMPI -lvtkParallelCore -lvtkCommonExecutionModel -lpetsc -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu -lfftw3_mpi -lfftw3 -llapack -lblas -lopenblas -lesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lbz2 -lcgns -lmedC -lmed -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lmetis -lz -lOpenCL -lyaml -lhwloc -lX11 -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgcc_s -lpthread -lquadmath -lstdc++ -ldl -ltirpc -fopenmp

_______________________________________________
Libmesh-users mailing list
Libmesh-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to