Following recent vectorizer changes to reductions via shifts, AArch64 will now reduce loops such as this

unsigned char in[8] = {1, 3, 5, 7, 9, 11, 13, 15};

int
main (unsigned char argc, char **argv)
{
  unsigned char prod = 1;

  /* Prevent constant propagation of the entire loop below.  */
  asm volatile ("" : : : "memory");

  for (unsigned char i = 0; i < 8; i++)
    prod *= in[i];

  if (prod != 17)
      __builtin_printf("Failed %d\n", prod);

  return 0;
}

using an 'ext' instruction from aarch64_expand_vec_perm_const:

main:
        adrp    x0, .LANCHOR0
        movi    v2.2s, 0    <=== note reg used here
        ldr     d1, [x0, #:lo12:.LANCHOR0]
        ext     v0.8b, v1.8b, v2.8b, #4
        mul     v1.8b, v1.8b, v0.8b
        ext     v0.8b, v1.8b, v2.8b, #2
        mul     v0.8b, v1.8b, v0.8b
        ext     v2.8b, v0.8b, v2.8b, #1
        mul     v0.8b, v0.8b, v2.8b
        umov    w1, v0.b[0]

The 'ext' works for both 64-bit vectors, and 128-bit vectors; but for 64-bit vectors, we can do slightly better using ushr; this patch improves the above to:

main:
        adrp    x0, .LANCHOR0
        ldr     d0, [x0, #:lo12:.LANCHOR0]
        ushr d1, d0, 32
        mul     v0.8b, v0.8b, v1.8b
        ushr d1, d0, 16
        mul     v0.8b, v0.8b, v1.8b
        ushr d1, d0, 8
        mul     v0.8b, v0.8b, v1.8b
        umov    w1, v0.b[0]
        ...

Tested with bootstrap + check-gcc on aarch64-none-linux-gnu.
Cross-testing of check-gcc on aarch64_be-none-elf in progress.

Ok if no regressions on big-endian?

Cheers,
--Alan

gcc/ChangeLog:

        * config/aarch64/aarch64-simd.md (vec_shr<mode>): New.

gcc/testsuite/ChangeLog:

        * lib/target-supports.exp
        (check_effective_target_whole_vector_shift): Add aarch64{,_be}.

Reply via email to