Re: [deal.II] Scaling behavior of Matrix-Free test program

Daniel Arndt Mon, 02 Mar 2020 10:45:52 -0800

Maxi,

it would likely be interesting to run the STREAM benchmark on these two
systems to see how much memory bandwidth you can hope to get.
I would not be too surprised if that still is the limiting factor for your
test.


Best,
Daniel

Am Mo., 2. März 2020 um 12:17 Uhr schrieb 'Maxi Miller' via deal.II User
Group <[email protected]>:

> I wrote a small test program for solving a non-linear equation using the
> RK4-solver implemented in deal.II, and assembling the right hand side using
> the matrix-free framework (code is attached). Afterwards I wanted to check
> the scaling behavior, after it should serve as a base for a larger program.
> Therefore I run several tests both on the development machine (i7-6700)
> with 8 threads and the high-performance machine (E5-2560 v4
> <https://ark.intel.com/content/www/us/en/ark/products/91767/intel-xeon-processor-e5-2650-v4-30m-cache-2-20-ghz.html>)
> with 24 threads. Both machines were configured for using AVX-extensions in
> deal.II, and the program itself was compiled in release mode.
>
> When running the program in both configurations, I compared the time it
> took for taking the first step in time:
>
> Local machine:
> MPI-Threads    TBB-Threads    Time (s)
> 1                     8                      170
> 2                     4                      40
> 4                     2                      20
>
> HPC:
> MPI-Threads    TBB-Threads    Time (s)
> 1                     24                    840
> 2                     12                    887
> 4                     6                      424
> 8                     3                      41
> 12                   2                      28
> 24                   1                      14
>
> I do not fully understand that behavior: Why is the code so much slower on
> the E5 compared to the i7, except for 24 threads? Due to a different clock
> frequency, or newer structure (Broadwell vs Skylake)? Why is the transition
> from 1 MPI thread to 2 MPI threads on the i7 four times faster, but going
> from 2 MPI threads to 4 MPI threads only twice (which is expected)?
> Similarly for the E5: Going from 1 thread to 2 threads does not speed up
> the code at all. Going from two to four threads halves the execution time
> (as expected), but going from four to eight results in a factor of ten. The
> steps afterwards follow the expected pattern again.
>
> Are there any explanations for the observed behavior? And if not, what can
> I do for a deeper investigation?
>
> Thanks!
>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/01ac3d77-d9bf-4197-a9a9-1f220c0af696%40googlegroups.com
> <https://groups.google.com/d/msgid/dealii/01ac3d77-d9bf-4197-a9a9-1f220c0af696%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAOYDWbL1JHBma7sY1wEQ_76Ns%2B54fincvbY_MX66P6n3%2Bs4fgQ%40mail.gmail.com.

Re: [deal.II] Scaling behavior of Matrix-Free test program

Reply via email to