[Bug target/79946] Suboptimal code with AVX2 copying all arguments to stack

adam at aphirst dot karoo.co.uk Tue, 07 Mar 2017 13:31:41 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79946


--- Comment #2 from Adam Hirst <adam at aphirst dot karoo.co.uk> ---
Just for clarification: is this only occuring for the case where one does 

    Dx = D%x
    tmp = matmul(NU,Dx);
    tensorproduct%x = dot_product(tmp,NV)

or is it also applicable to

    tmp = matmul(NU,D%x);
    tensorproduct%x = dot_product(tmp,NV)

? The distinction between the two came up in the discussion for PR79930, and at
least on my machine, the code for the former performed much worse than the
latter at -O2, only getting any performance back once using -Ofast and -flto.

[Bug target/79946] Suboptimal code with AVX2 copying all arguments to stack

Reply via email to