Hi Denis,

> From the matrix-free tutorials 37 and 48, I see that the recommended
> flags for VectorizedArrays<double> with GCC are
>
> -DCMAKE_CXX_FLAGS="-march=native"
>
> How about using -O3, -ffast-math, -funroll-loops ? Any other
> recommended flags for GCC?
-march=native gives you AVX vectorization on most modern Intel CPUs
(starting from Sandy Bridge), which doubles the width of
VectorizedArray<double> from 2 to 4. For computation bound algorithms
this almost doubles performance. Look for the line "-- Performing Test
DEAL_II_HAVE_AVX - Success" in the deal.II configuration to see whether
it gets enabled.

When comparing to a potential 2x speedup with AVX, -O3 helps only
little, and so does -ffast-math. In all my benchmark tests, their impact
has been on the level of noise. (One of my PhD students burns many
millions of CPU hours on big SandyBridge/Haswell clusters, which made me
spend hours on writing proposals, so we have checked.) -funroll-loops
should be enabled by default IIRC. For clang, I usually also enable
"-ffp-contract=fast" to enable fused multiply-add as it does not appear
per default. GCC does that in the default settings already it appears.

Best,
Martin

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to