On Tue, Jun 24, 2025 at 1:26 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Mon, Jun 23, 2025 at 4:53 PM Hongtao Liu <crazy...@gmail.com> wrote: > > > > On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > > > > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > > > > > Extend the remove_redundant_vector pass to handle vector broadcasts > > > > > > from > > > > > > constant and variable scalars. When broadcasting from constants and > > > > > > function arguments, we can place a single widest vector broadcast at > > > > > > entry of the nearest common dominator for basic blocks with all uses > > > > > > since constants and function arguments aren't changed. For > > > > > > broadcast > > > > > > from variables with a single definition, the single definition is > > > > > > replaced with the widest broadcast. > > > > > > > > > > > > gcc/ > > > > > > > > > > > > PR target/92080 > > > > > > * config/i386/i386-expand.cc (ix86_expand_call): Set > > > > > > recursive_function to true for recursive call. > > > > > > * config/i386/i386-features.cc > > > > > > (ix86_place_single_vector_set): > > > > > > Add an argument for inner scalar, default to nullptr. Set > > > > > > the > > > > > > source from inner scalar if not nullptr. > > > > > > (ix86_get_vector_load_mode): Renamed to ... > > > > > > (ix86_get_vector_cse_mode): This. Add an argument for > > > > > > scalar mode > > > > > > and handle integer and float scalar modes. > > > > > > (replace_vector_const): Add an argument for scalar mode and > > > > > > pass > > > > > > it to ix86_get_vector_load_mode. > > > > > > (x86_cse_kind): New. > > > > > > (redundant_load): Likewise. > > > > > > (ix86_broadcast_inner): Likewise. > > > > > > (remove_redundant_vector_load): Also support const0_rtx and > > > > > > constm1_rtx broadcasts. Handle vector broadcasts from > > > > > > constant > > > > > > and variable scalars. > > > > > > * config/i386/i386.h (machine_function): Add > > > > > > recursive_function. > > > > > > > > > > > > gcc/testsuite/ > > > > > > > > > > > > * gcc.target/i386/keylocker-aesdecwide128kl.c: Updated to > > > > > > expect > > > > > > movdqa instead pxor. > > > > > > * gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise. > > > > > > * gcc.target/i386/keylocker-aesencwide128kl.c: Likewise. > > > > > > * gcc.target/i386/keylocker-aesencwide256kl.c: Likewise. > > > > > > * gcc.target/i386/pr92080-4.c: New test. > > > > > > * gcc.target/i386/pr92080-5.c: Likewise. > > > > > > * gcc.target/i386/pr92080-6.c: Likewise. > > > > > > * gcc.target/i386/pr92080-7.c: Likewise. > > > > > > * gcc.target/i386/pr92080-8.c: Likewise. > > > > > > * gcc.target/i386/pr92080-9.c: Likewise. > > > > > > * gcc.target/i386/pr92080-10.c: Likewise. > > > > > > * gcc.target/i386/pr92080-11.c: Likewise. > > > > > > * gcc.target/i386/pr92080-12.c: Likewise. > > > > > > * gcc.target/i386/pr92080-13.c: Likewise. > > > > > > * gcc.target/i386/pr92080-14.c: Likewise. > > > > > > * gcc.target/i386/pr92080-15.c: Likewise. > > > > > > * gcc.target/i386/pr92080-16.c: Likewise. > > > > > > > > > > > > Signed-off-by: H.J. Lu <hjl.to...@gmail.com> > > > > > > --- > > > > > > gcc/config/i386/i386-expand.cc | 3 + > > > > > > gcc/config/i386/i386-features.cc | 410 > > > > > > ++++++++++++++---- > > > > > > gcc/config/i386/i386.h | 3 + > > > > > > .../i386/keylocker-aesdecwide128kl.c | 14 +- > > > > > > .../i386/keylocker-aesdecwide256kl.c | 14 +- > > > > > > .../i386/keylocker-aesencwide128kl.c | 14 +- > > > > > > .../i386/keylocker-aesencwide256kl.c | 14 +- > > > > > > gcc/testsuite/gcc.target/i386/pr92080-10.c | 13 + > > > > > > gcc/testsuite/gcc.target/i386/pr92080-11.c | 33 ++ > > > > > > gcc/testsuite/gcc.target/i386/pr92080-12.c | 16 + > > > > > > gcc/testsuite/gcc.target/i386/pr92080-13.c | 32 ++ > > > > > > gcc/testsuite/gcc.target/i386/pr92080-14.c | 31 ++ > > > > > > gcc/testsuite/gcc.target/i386/pr92080-15.c | 25 ++ > > > > > > gcc/testsuite/gcc.target/i386/pr92080-16.c | 26 ++ > > > > > > gcc/testsuite/gcc.target/i386/pr92080-4.c | 50 +++ > > > > > > gcc/testsuite/gcc.target/i386/pr92080-5.c | 109 +++++ > > > > > > gcc/testsuite/gcc.target/i386/pr92080-6.c | 19 + > > > > > > gcc/testsuite/gcc.target/i386/pr92080-7.c | 20 + > > > > > > gcc/testsuite/gcc.target/i386/pr92080-8.c | 16 + > > > > > > gcc/testsuite/gcc.target/i386/pr92080-9.c | 81 ++++ > > > > > > 20 files changed, 823 insertions(+), 120 deletions(-) > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-10.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-11.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-12.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-13.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-14.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-15.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-16.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-4.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-5.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-6.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-7.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-8.c > > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-9.c > > > > > > > > > > > > > > > + else > > > > > > + { > > > > > > + while (SUBREG_P (dest)) > > > > > > + dest = SUBREG_REG (dest); > > > > > > + > > > > > > + /* Skip if the SET destination mode doesn't match. */ > > > > > > + if (GET_MODE (dest) != mode) > > > > > > + return nullptr; > > > > > > > > > > Can we just require (dest == reg || dest == op), otherwise we need to > > > > > make sure GET_MODE of the original dest can cover mode of op(which is > > > > > more complicated, need to make sure SUBREG_BYTE is also zero???) > > > > > > > > I will change it to > > > > > > > > /* Skip if the SET destination isn't the broadcast source. */ > > > > if (dest != reg) > > > > return nullptr; > > > > > > Here is the v4 patch with: > > > > > > /* The SET destination must be the broadcast source. */ > > > gcc_assert (dest == op); > > I don't understand this, looks like you're post the dump patch instead > > of the original one. > > Ooops. Here is the real v4 patch which simplifies ix86_broadcast_inner > to > > rtx src = SET_SRC (set); > > if (CONST_INT_P (src)) > { > op = src; > *insn_p = nullptr; > } > else > { > *insn_p = insn; > } > > *scalar_mode_p = mode; > return op; > > OK for master? Ok. > > Thanks. > > -- > H.J.
-- BR, Hongtao