On Tue, Jun 24, 2025 at 1:26 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> On Mon, Jun 23, 2025 at 4:53 PM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> > >
> > > On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > >
> > > > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu <crazy...@gmail.com> wrote:
> > > > >
> > > > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > > > >
> > > > > > Extend the remove_redundant_vector pass to handle vector broadcasts 
> > > > > > from
> > > > > > constant and variable scalars.  When broadcasting from constants and
> > > > > > function arguments, we can place a single widest vector broadcast at
> > > > > > entry of the nearest common dominator for basic blocks with all uses
> > > > > > since constants and function arguments aren't changed.  For 
> > > > > > broadcast
> > > > > > from variables with a single definition, the single definition is
> > > > > > replaced with the widest broadcast.
> > > > > >
> > > > > > gcc/
> > > > > >
> > > > > >         PR target/92080
> > > > > >         * config/i386/i386-expand.cc (ix86_expand_call): Set
> > > > > >         recursive_function to true for recursive call.
> > > > > >         * config/i386/i386-features.cc 
> > > > > > (ix86_place_single_vector_set):
> > > > > >         Add an argument for inner scalar, default to nullptr.  Set 
> > > > > > the
> > > > > >         source from inner scalar if not nullptr.
> > > > > >         (ix86_get_vector_load_mode): Renamed to ...
> > > > > >         (ix86_get_vector_cse_mode): This.  Add an argument for 
> > > > > > scalar mode
> > > > > >         and handle integer and float scalar modes.
> > > > > >         (replace_vector_const): Add an argument for scalar mode and 
> > > > > > pass
> > > > > >         it to ix86_get_vector_load_mode.
> > > > > >         (x86_cse_kind): New.
> > > > > >         (redundant_load): Likewise.
> > > > > >         (ix86_broadcast_inner): Likewise.
> > > > > >         (remove_redundant_vector_load): Also support const0_rtx and
> > > > > >         constm1_rtx broadcasts.  Handle vector broadcasts from 
> > > > > > constant
> > > > > >         and variable scalars.
> > > > > >         * config/i386/i386.h (machine_function): Add 
> > > > > > recursive_function.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > >
> > > > > >         * gcc.target/i386/keylocker-aesdecwide128kl.c: Updated to 
> > > > > > expect
> > > > > >         movdqa instead pxor.
> > > > > >         * gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise.
> > > > > >         * gcc.target/i386/keylocker-aesencwide128kl.c: Likewise.
> > > > > >         * gcc.target/i386/keylocker-aesencwide256kl.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-4.c: New test.
> > > > > >         * gcc.target/i386/pr92080-5.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-6.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-7.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-8.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-9.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-10.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-11.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-12.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-13.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-14.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-15.c: Likewise.
> > > > > >         * gcc.target/i386/pr92080-16.c: Likewise.
> > > > > >
> > > > > > Signed-off-by: H.J. Lu <hjl.to...@gmail.com>
> > > > > > ---
> > > > > >  gcc/config/i386/i386-expand.cc                |   3 +
> > > > > >  gcc/config/i386/i386-features.cc              | 410 
> > > > > > ++++++++++++++----
> > > > > >  gcc/config/i386/i386.h                        |   3 +
> > > > > >  .../i386/keylocker-aesdecwide128kl.c          |  14 +-
> > > > > >  .../i386/keylocker-aesdecwide256kl.c          |  14 +-
> > > > > >  .../i386/keylocker-aesencwide128kl.c          |  14 +-
> > > > > >  .../i386/keylocker-aesencwide256kl.c          |  14 +-
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-10.c    |  13 +
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-11.c    |  33 ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-12.c    |  16 +
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-13.c    |  32 ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-14.c    |  31 ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-15.c    |  25 ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-16.c    |  26 ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-4.c     |  50 +++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-5.c     | 109 +++++
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-6.c     |  19 +
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-7.c     |  20 +
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-8.c     |  16 +
> > > > > >  gcc/testsuite/gcc.target/i386/pr92080-9.c     |  81 ++++
> > > > > >  20 files changed, 823 insertions(+), 120 deletions(-)
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-10.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-11.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-12.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-13.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-14.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-15.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-16.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-4.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-5.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-6.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-7.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-8.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-9.c
> > > > > >
> > >
> > > > > > +  else
> > > > > > +    {
> > > > > > +      while (SUBREG_P (dest))
> > > > > > +       dest = SUBREG_REG (dest);
> > > > > > +
> > > > > > +      /* Skip if the SET destination mode doesn't match.  */
> > > > > > +      if (GET_MODE (dest) != mode)
> > > > > > +       return nullptr;
> > > > >
> > > > > Can we just require (dest == reg || dest == op), otherwise we need to
> > > > > make sure GET_MODE of the original dest can cover mode of op(which is
> > > > > more complicated, need to make sure SUBREG_BYTE is also zero???)
> > > >
> > > > I will change it to
> > > >
> > > >       /* Skip if the SET destination isn't the broadcast source.  */
> > > >       if (dest != reg)
> > > >         return nullptr;
> > >
> > > Here is the v4 patch with:
> > >
> > >       /* The SET destination must be the broadcast source.  */
> > >       gcc_assert (dest == op);
> > I don't understand this, looks like you're post the dump patch instead
> > of the original one.
>
> Ooops.   Here is the real v4 patch which simplifies ix86_broadcast_inner
> to
>
>  rtx src = SET_SRC (set);
>
>   if (CONST_INT_P (src))
>     {
>       op = src;
>       *insn_p = nullptr;
>     }
>   else
>     {
>       *insn_p = insn;
>     }
>
>   *scalar_mode_p = mode;
>   return op;
>
> OK for master?
Ok.
>
> Thanks.
>
> --
> H.J.



-- 
BR,
Hongtao

Reply via email to