https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119
--- Comment #36 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> --- (In reply to Jerry DeLisle from comment #34) > -Ofast does reorder execution.. > Opinions welcome. That is absolutely OK for a matmul, and all techniques to get near peak performance require that (e.g. use of fma, blocking, etc.). I didn't realize that one can easily put pragmas for single routines, so you could experiment with something like #pragma GCC optimize ( "-Ofast -fvariable-expansion-in-unroller -funroll-loops" )