https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579
Bug ID: 90579 Summary: Huge store forward stall due to vectorizer Product: gcc Version: 9.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com Target Milestone: --- Target: x86-64 loop/avx256 branch at https://gitlab.com/x86-benchmarks/microbenchmark shows huge store forward stall due to vectorizer in --- extern double r[6]; extern double a[]; double loop (int k, double x) { int i; double t=0; for (i=0;i<6;i++) r[i] = x * a[i + k]; for (i=0;i<6;i++) t+=r[5-i]; return t; } --- when compiled with -O3 -march=skylake: [hjl@gnu-cfl-1 microbenchmark]$ perf stat -e ld_blocks.store_forward ./event loop: 229408 Performance counter stats for './event': 1 ld_blocks.store_forward:u 0.000478529 seconds time elapsed 0.000502000 seconds user 0.000000000 seconds sys [hjl@gnu-cfl-1 microbenchmark]$ perf stat -e ld_blocks.store_forward ./event-avx128 loop: 191390 Performance counter stats for './event-avx128': 1 ld_blocks.store_forward:u 0.000526154 seconds time elapsed 0.000507000 seconds user 0.000000000 seconds sys [hjl@gnu-cfl-1 microbenchmark]$ perf stat -e ld_blocks.store_forward ./event-avx256 loop: 1312864 Performance counter stats for './event-avx256': 30,001 ld_blocks.store_forward:u 0.000756643 seconds time elapsed 0.000723000 seconds user 0.000000000 seconds sys [hjl@gnu-cfl-1 microbenchmark]$