https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79946
--- Comment #2 from Adam Hirst <adam at aphirst dot karoo.co.uk> --- Just for clarification: is this only occuring for the case where one does Dx = D%x tmp = matmul(NU,Dx); tensorproduct%x = dot_product(tmp,NV) or is it also applicable to tmp = matmul(NU,D%x); tensorproduct%x = dot_product(tmp,NV) ? The distinction between the two came up in the discussion for PR79930, and at least on my machine, the code for the former performed much worse than the latter at -O2, only getting any performance back once using -Ofast and -flto.