https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117801

            Bug ID: 117801
           Summary: aarch64: 20% regression in TSVC s278 since
                    r15-3509-gd34cda72098867
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dhruvc at nvidia dot com
  Target Milestone: ---

The s278 kernel has the following code:

===
real_t s278(struct args_t * func_args)
{

//    control flow
//    if/goto to block if-then-else

    initialise_arrays(__func__);
    gettimeofday(&func_args->t1, NULL);

    for (int nl = 0; nl < iterations; nl++) {
        for (int i = 0; i < LEN_1D; i++) {
            if (a[i] > (real_t)0.) {
                goto L20;
            }
            b[i] = -b[i] + d[i] * e[i];
            goto L30;
L20:
            c[i] = -c[i] + d[i] * e[i];
L30:
            a[i] = b[i] + c[i] * d[i];
        }
        dummy(a, b, c, d, e, aa, bb, cc, 0.);
    }

    gettimeofday(&func_args->t2, NULL);
    return calc_checksum(__func__);
}
===

Since r15-3509-gd34cda720988674bcf8a24267c9e1ec61335d6de, an extra mov is being
generated in the inner loop. This (likely along with instruction ordering
differences) is causing a 20% slowdown in the execution of the kernel.

Assembly differences between GCC 15 and GCC
14.2:https://godbolt.org/z/Kvaj5W6ve
Assembly differences on LLVM-MCA: https://godbolt.org/z/9jWKf4c6n

Reply via email to