Hi Denis, > From the matrix-free tutorials 37 and 48, I see that the recommended > flags for VectorizedArrays<double> with GCC are > > -DCMAKE_CXX_FLAGS="-march=native" > > How about using -O3, -ffast-math, -funroll-loops ? Any other > recommended flags for GCC? -march=native gives you AVX vectorization on most modern Intel CPUs (starting from Sandy Bridge), which doubles the width of VectorizedArray<double> from 2 to 4. For computation bound algorithms this almost doubles performance. Look for the line "-- Performing Test DEAL_II_HAVE_AVX - Success" in the deal.II configuration to see whether it gets enabled.
When comparing to a potential 2x speedup with AVX, -O3 helps only little, and so does -ffast-math. In all my benchmark tests, their impact has been on the level of noise. (One of my PhD students burns many millions of CPU hours on big SandyBridge/Haswell clusters, which made me spend hours on writing proposals, so we have checked.) -funroll-loops should be enabled by default IIRC. For clang, I usually also enable "-ffp-contract=fast" to enable fused multiply-add as it does not appear per default. GCC does that in the default settings already it appears. Best, Martin -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.