https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801
Bug ID: 117801 Summary: aarch64: 20% regression in TSVC s278 since r15-3509-gd34cda72098867 Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: dhruvc at nvidia dot com Target Milestone: --- The s278 kernel has the following code: === real_t s278(struct args_t * func_args) { // control flow // if/goto to block if-then-else initialise_arrays(__func__); gettimeofday(&func_args->t1, NULL); for (int nl = 0; nl < iterations; nl++) { for (int i = 0; i < LEN_1D; i++) { if (a[i] > (real_t)0.) { goto L20; } b[i] = -b[i] + d[i] * e[i]; goto L30; L20: c[i] = -c[i] + d[i] * e[i]; L30: a[i] = b[i] + c[i] * d[i]; } dummy(a, b, c, d, e, aa, bb, cc, 0.); } gettimeofday(&func_args->t2, NULL); return calc_checksum(__func__); } === Since r15-3509-gd34cda720988674bcf8a24267c9e1ec61335d6de, an extra mov is being generated in the inner loop. This (likely along with instruction ordering differences) is causing a 20% slowdown in the execution of the kernel. Assembly differences between GCC 15 and GCC 14.2:https://godbolt.org/z/Kvaj5W6ve Assembly differences on LLVM-MCA: https://godbolt.org/z/9jWKf4c6n