https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119
--- Comment #37 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> --- (In reply to Joost VandeVondele from comment #36) > #pragma GCC optimize ( "-Ofast -fvariable-expansion-in-unroller > -funroll-loops" ) and really beneficial for larger matrices would be -floop-nest-optimize in particular the blocking (it would be an additional motivation for PR14741 and work on graphite in general), don't know if one can give the parameter for the blocking. In principle the loop-nest-optimization, together with the -Ofast (and ideally -march=native, which we can't have in libgfortran, I assume) would yield near peak performance.