https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859

--- Comment #16 from Thorsten Kurth <thorstenkurth at me dot com> ---
FYI, the code is:

https://github.com/zronaghi/BoxLib.git

in branch

cpp_kernels_openmp4dot5

and then in Src/LinearSolvers/C_CellMG

the file ABecLaplacian.cpp. For example, lines 542 and 543 can be commented out
and commented in and when the test case in run you get significant slowdown
when the code is compiled with that stuff commented in. I did not map all the
scalar stuff so it might be that this is a problem. But in any case, it should
not create copies of that stuff at all in my opinion.

Please don't look at that code right now because it is a bit convoluted I just
wanted to show that this issue appears. So when I have the target section I
mentioned above commented in I get by running:

#!/bin/bash

export OMP_NESTED=false
export OMP_NUM_THREADS=64
export OMP_PLACES=threads
export OMP_PROC_BIND=spread
export OMP_MAX_ACTIVE_LEVELS=1

execpath="/project/projectdirs/mpccc/tkurth/Portability/BoxLib/Tutorials/MultiGrid_C"
exec=`ls -latr ${execpath}/main3d.*.MPI.OMP.ex | awk '{print $9}'`

#execute
${exec} inputs

the following:

tkurth@nid06760:/global/cscratch1/sd/tkurth/boxlib_omp45> ./run_example.sh
MPI initialized with 1 MPI processes
OMP initialized with 64 OMP threads
Using Dirichlet or Neumann boundary conditions.
Grid resolution : 128 (cells)
Domain size     : 1 (length unit) 
Max_grid_size   : 32 (cells)
Number of grids : 64
Sum of RHS      : -2.68882138776405e-17
----------------------------------------
Solving with BoxLib C++ solver 
WARNING: using C++ kernels in LinOp
WARNING: using C++ MG solver with C kernels
MultiGrid: Initial rhs                = 135.516568492921
MultiGrid: Initial residual           = 135.516568492921
MultiGrid: Iteration   1 resid/bnorm = 0.379119045820053
MultiGrid: Iteration   2 resid/bnorm = 0.0107971623268356
MultiGrid: Iteration   3 resid/bnorm = 0.000551321916982188
MultiGrid: Iteration   4 resid/bnorm = 3.55014555643671e-05
MultiGrid: Iteration   5 resid/bnorm = 2.57082340920002e-06
MultiGrid: Iteration   6 resid/bnorm = 1.90970439886018e-07
MultiGrid: Iteration   7 resid/bnorm = 1.44525222814178e-08
MultiGrid: Iteration   8 resid/bnorm = 1.10675190626368e-09
MultiGrid: Iteration   9 resid/bnorm = 8.55424251440489e-11
MultiGrid: Iteration   9 resid/bnorm = 8.55424251440489e-11
, Solve time: 5.84898591041565, CG time: 0.162226438522339
   Converged res < eps_rel*max(bnorm,res_norm)
   Run time      : 5.98936820030212
----------------------------------------
Unused ParmParse Variables:
[TOP]::hypre.solver_flag(nvals = 1)  :: [1]
[TOP]::hypre.pfmg_rap_type(nvals = 1)  :: [1]
[TOP]::hypre.pfmg_relax_type(nvals = 1)  :: [2]
[TOP]::hypre.num_pre_relax(nvals = 1)  :: [2]
[TOP]::hypre.num_post_relax(nvals = 1)  :: [2]
[TOP]::hypre.skip_relax(nvals = 1)  :: [1]
[TOP]::hypre.print_level(nvals = 1)  :: [1]
done.

When I comment it out, recompile, I get:

tkurth@nid06760:/global/cscratch1/sd/tkurth/boxlib_omp45> ./run_example.sh
MPI initialized with 1 MPI processes
OMP initialized with 64 OMP threads
Using Dirichlet or Neumann boundary conditions.
Grid resolution : 128 (cells)
Domain size     : 1 (length unit) 
Max_grid_size   : 32 (cells)
Number of grids : 64
Sum of RHS      : -2.68882138776405e-17
----------------------------------------
Solving with BoxLib C++ solver 
WARNING: using C++ kernels in LinOp
WARNING: using C++ MG solver with C kernels
MultiGrid: Initial rhs                = 135.516568492921
MultiGrid: Initial residual           = 135.516568492921
MultiGrid: Iteration   1 resid/bnorm = 0.379119045820053
MultiGrid: Iteration   2 resid/bnorm = 0.0107971623268356
MultiGrid: Iteration   3 resid/bnorm = 0.000551321916981978
MultiGrid: Iteration   4 resid/bnorm = 3.5501455563633e-05
MultiGrid: Iteration   5 resid/bnorm = 2.5708234090034e-06
MultiGrid: Iteration   6 resid/bnorm = 1.90970439781153e-07
MultiGrid: Iteration   7 resid/bnorm = 1.44525225042545e-08
MultiGrid: Iteration   8 resid/bnorm = 1.10675108045705e-09
MultiGrid: Iteration   9 resid/bnorm = 8.55424251440489e-11
MultiGrid: Iteration   9 resid/bnorm = 8.55424251440489e-11
, Solve time: 0.759385108947754, CG time: 0.14183521270752
   Converged res < eps_rel*max(bnorm,res_norm)
   Run time      : 0.879786014556885
----------------------------------------
Unused ParmParse Variables:
[TOP]::hypre.solver_flag(nvals = 1)  :: [1]
[TOP]::hypre.pfmg_rap_type(nvals = 1)  :: [1]
[TOP]::hypre.pfmg_relax_type(nvals = 1)  :: [2]
[TOP]::hypre.num_pre_relax(nvals = 1)  :: [2]
[TOP]::hypre.num_post_relax(nvals = 1)  :: [2]
[TOP]::hypre.skip_relax(nvals = 1)  :: [1]
[TOP]::hypre.print_level(nvals = 1)  :: [1]
done.

it is like 7.3x slowdown. The smoothing kernel (gauss-seidel red-black) is the
most expensive kernel in the Multi-Grid code, so I see the biggest effect here.
But the other kernels (prolongation, restriction, dot products etc) have
slowdowns as well amounting to a total of more than 10x for the whole app.

Reply via email to