[llvm-bugs] [Bug 64306] x64 Codegen worse than MSVC for vectorized loop

LLVM Bugs via llvm-bugs Tue, 01 Aug 2023 05:29:59 -0700

Issue	64306
Summary	x64 Codegen worse than MSVC for vectorized loop
Labels
Assignees
Reporter	rainerzufalldererste

    Haven't had the chance to investigate too much, but this is the loop: https://godbolt.org/z/e3dMW5fbe (from here https://github.com/rainerzufalldererste/simd_dct/blob/cstiller/avx_dct/src/simd_dct.cpp#L2062)


7680x7680 file, 1024 runs (AVX2 variant)

Zen4:
| compiler | avx (std-dev) | avg (std-dev) |
| :- | - | - |
msvc    |    0.32 clk/byte (   0.31 ~ 0.32) | 13483.36 MiB/s (13345.56 ~ 13624.05) |
clang++-16 |    0.37 clk/byte (   0.36 ~    0.38) | 11602.44 MiB/s (11401.21 ~ 11810.91)
g++-11.3  |    0.36 clk/byte (   0.35 ~    0.37) | 11828.99 MiB/s (11575.90 ~ 12093.38) |


Skylake-Client:
| compiler | avx (std-dev) | avg (std-dev) |
| :- | - | - |
msvc    |    0.63 clk/byte ( 0.61 ~    0.65) | 4828.56 MiB/s ( 4700.72 ~  4963.55) |
clang++-16  | 0.64 clk/byte (   0.61 ~    0.67) |  4743.21 MiB/s ( 4550.35 ~  4953.13) |
g++-11.3   |    0.64 clk/byte (   0.61 ~    0.67) | 4737.80 MiB/s ( 4534.87 ~  4959.73) |

Very surprising to see MSVC doing well, here. I'll try to investigate further, to pin down which part of the generated ASM is particularly detrimental, but wanted to document this somewhere for now. The performance of the AVX-512 variant is significantly better on clang than both GCC & MSVC, which is why I was so surprised by this result.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 64306] x64 Codegen worse than MSVC for vectorized loop

Reply via email to