Hi Haochen,
on 2024/1/11 16:28, HAO CHEN GUI wrote:
> Hi,
> This patch eliminates unnecessary byte swaps for block clear on P8
> LE. For block clear, all the bytes are set to zero. The byte order
> doesn't make sense. So the alignment of destination could be set to
> the store mode size in stead of 1 byte in order to eliminates
> unnecessary byte swap instructions on P8 LE. The test case shows the
> problem.
I agree with Richi's concern, a bytes swap can be eliminated if the
bytes swapped result is known as before, one typical case is the vector
constant with predicate const_vector_each_byte_same, we can do some
optimization for that.
BR,
Kewen
>
> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
>
> Thanks
> Gui Haochen
>
> ChangeLog
> rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE
>
> gcc/
> PR target/113325
> * config/rs6000/rs6000-string.cc (expand_block_clear): Set the
> alignment of destination to the size of mode.
>
> gcc/testsuite/
> PR target/113325
> * gcc.target/powerpc/pr113325.c: New.
>
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc
> b/gcc/config/rs6000/rs6000-string.cc
> index 7f777666ba9..4c9b2cbeefc 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
> }
>
> dest = adjust_address (orig_dest, mode, offset);
> -
> + /* Set the alignment of dest to the size of mode in order to
> + avoid unnecessary byte swaps on LE. */
> + set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);
> emit_move_insn (dest, CONST0_RTX (mode));
> }
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c
> b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> new file mode 100644
> index 00000000000..4a3cae019c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
> +
> +void* foo (void* s1)
> +{
> + return __builtin_memset (s1, 0, 32);
> +}