On 2021-06-18 21:45, John Peterson wrote:
Your compiler flags are definitely far more advanced/aggressive than
mine,
I cannot take credit for that, really. I only modified the -O2 to -O3,
made sure that -funroll_loops was there and customised to my processor
(amdfam10). All the other flags come directly from the Makefile provided
by libMesh.
which are just on the default of -O2. However, I think what we should
conclude from your results is that there is something slower than it
needs
to be with DenseMatrix::resize(), not that we should move the
DenseMatrix
creation/destruction inside the loop over elements. What I tried (see
attached patch or the "dense_matrix_resize_no_virtual" branch in my
fork)
is avoiding the virtual function call to DenseMatrix::zero() which is
currently made from DenseMatrix::resize(). In my testing, this change
did
not seem to make much of a difference but I'm curious about what you
would
get with your compiler args, this patch, and the unpatched ex4.
There _is_ something consistently different for sure. I only ran the
case with `mpirun -np 4' and `-n 40'. The difference of the sums of
times is in the order of 1 second. For five tests of this size and my
rather limited system, I would say that your change yields marginally
faster computation, and should be used. In which case, my modifications
should be avoided.
In the interest of completeness, I need to say that I had to rebuild
libMesh, because of compilation errors. I don't quite remember what
version it is right now, but it is not the updated master branch (due to
some issues that I am having with my Internet connection). Although this
may not affect the comparison, it should be noted.
The results are shown below and in examples/introduction/sums.org
#+name: tbl-results
#+caption: The first two columns correspond to the (patched) original
code. The last pair are the results with my modification (also with
patch). In each case, the first of the columns is alive time, and the
second one is active time. Data was copied from the .bz2 files.
| 3.65205 | 1.292 | 3.63248 | 1.31057 |
| 4.82533 | 1.76303 | 5.31107 | 1.95794 |
| 5.05955 | 1.84457 | 5.26696 | 1.964 |
| 3.86126 | 1.40952 | 3.53834 | 1.29313 |
| 3.58892 | 1.30998 | 4.369 | 1.59834 |
#+caption: calculate the sums of each column
#+begin_src python :var data=tbl-results
ex4_alive = sum((I[0] for I in data))
ex4_active = sum((I[1] for I in data))
ex4_mod_alive = sum((I[2] for I in data))
ex4_mod_active = sum((I[3] for I in data))
return [["ex4_alive", "ex4_active", "ex4_mod_alive",
"ex4_mod_active"],
None,
[ex4_alive, ex4_active, ex4_mod_alive, ex4_mod_active]]
#+end_src
#+RESULTS:
| ex4_alive | ex4_active | ex4_mod_alive | ex4_mod_active |
|-----------+--------------------+--------------------+----------------|
| 20.98711 | 7.6190999999999995 | 22.117849999999997 | 8.12398 |
_______________________________________________
Libmesh-users mailing list
Libmesh-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-users