Re: [PATCH v2] x86: Insert extra move for mode size smaller than natural size

Hongtao Liu Tue, 06 May 2025 18:18:21 -0700

On Wed, May 7, 2025 at 9:06 AM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> On Tue, May 6, 2025 at 3:35 PM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > On Tue, May 6, 2025 at 3:06 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> > >
> > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao <hongtao....@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: H.J. Lu <hjl.to...@gmail.com>
> > > > > Sent: Tuesday, May 6, 2025 2:16 PM
> > > > > To: Liu, Hongtao <hongtao....@intel.com>
> > > > > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Uros Bizjak
> > > > > <ubiz...@gmail.com>
> > > > > Subject: Re: [PATCH] x86: Skip if the mode size is smaller than its 
> > > > > natural size
> > > > >
> > > > > On Tue, May 6, 2025 at 10:54 AM Liu, Hongtao <hongtao....@intel.com>
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: H.J. Lu <hjl.to...@gmail.com>
> > > > > > > Sent: Thursday, May 1, 2025 6:39 AM
> > > > > > > To: GCC Patches <gcc-patches@gcc.gnu.org>; Uros Bizjak
> > > > > > > <ubiz...@gmail.com>; Liu, Hongtao <hongtao....@intel.com>
> > > > > > > Subject: [PATCH] x86: Skip if the mode size is smaller than its
> > > > > > > natural size
> > > > > > >
> > > > > > > When generating a SUBREG from V16QI to V2HF, validate_subreg fails
> > > > > > > since the V2HF size (4 bytes) is smaller than its natural size 
> > > > > > > (word size).
> > > > > > > Update remove_redundant_vector_load to skip if the mode size is
> > > > > > > smaller than its natural size.
> > > > > > I think we can also handle it in replace_vector_const by inserting 
> > > > > > an
> > > > > > extra move with (Set (reg:v4qi) (subreg:v4qi (v16qi const0_rtx) 0))
> > > > > > And then use subreg with same vector size (v2hf<->v4qi) (set
> > > > > > (reg:v2hf) (subreg:v2hf (reg:v4qi) 0))
> > > > >
> > > > > What is the advantage of this approach?  My patch uses a single 
> > > > > instruction to
> > > > > write 4 bytes of 0s and 1s.  Your suggestion needs at least one more
> > > > > instruction.
> > > > I'm not asking to do it for all the cases, just to handle those cases 
> > > > with invalid subreg
> > > >
> > > > @@ -3334,8 +3334,11 @@ replace_vector_const (machine_mode vector_mode, 
> > > > rtx vector_const,
> > > >        machine_mode mode = GET_MODE (dest);
> > > >
> > > >        rtx replace;
> > > > +      if (!validate_subreg (mode, vector_mode, vector_const, 0))
> > > > +       /* Insert an extra move to avoid invalid subreg.  */
> > > > +      .........
> > > >        /* Replace the source operand with VECTOR_CONST.  */
> > > > -      if (SUBREG_P (dest) || mode == vector_mode)
> > > > +      else if (SUBREG_P (dest) || mode == vector_mode)
> > > >         replace = vector_const;
> > > >        else
> > > >         replace = gen_rtx_SUBREG (mode, vector_const, 0);
> > > >
> > > > For valid subreg, no need for extra instruction.
> > > > I think RA can eliminate the extra move, then the optimization is not 
> > > > limited to "the mode size is smaller than its natural size".
> > >
> > > The only "the mode size is smaller than its natural size" cases are 4
> > > bytes (64-bit mode)
> > > or 2 bytes (32-bit mode).  In both cases, there is no need for vector
> > > write with subreg.
> > It depends on the use, if the use must be put into the vector
> > register, .i.e below testcase, I assume your patch will restore the
> > extra pxor?
> >
> > typedef char v4qi __attribute__((vector_size(4)));
> > typedef char v16qi __attribute__((vector_size(16)));
> >
> >
> > v4qi a;
> > v16qi b;
> > void
> > foo (v4qi* c, v16qi* d)
> > {
> >     v4qi sum = __extension__(v4qi){0, 0, 0, 0};
> >     v16qi sum2 = __extension__(v16qi){0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0};
> >     for (int i = 0; i != 100; i++)
> >      sum += c[i];
> >     for (int i = 0 ; i != 100; i++)
> >       sum2 += d[i];
> >     a = sum;
> >     b = sum2;
> > }
> >
> > And since this patch is to solve the issue of invalid subreg, why
> > don't we just use validate_subreg to guard in
> > remove_redundant_vector_load.
> > >
>
> Fixed in the v2 patch with 2 new tests.
>


      /* NB: Don't run recog_memoized here since vector SUBREG may not
  be valid.  Let LRA handle vector SUBREG.  */

I guess this comment can be removed?

Otherwise LGTM.

> Thanks.
>
> --
> H.J.
> ---
> When generating a SUBREG from V16QI to V2HF, validate_subreg fails since
> V2HF is a floating point vector and its size (4 bytes) is smaller than its
> natural size (word size).  Insert an extra move with a QI vector SUBREG of
> the same size to avoid validate_subreg failure.
>
> gcc/
>
> PR target/120036
> * config/i386/i386-features.cc (ix86_get_vector_load_mode):
> Handle 8/4/2 bytes.
> (remove_redundant_vector_load): If the mode size is smaller than
> its natural size, first insert an extra move with a QI vector
> SUBREG of the same size to avoid validate_subreg failure.
>
> gcc/testsuite/
>
> PR target/120036
> * g++.target/i386/pr120036.C: New test.
> * gcc.target/i386/pr117839-3a.c: Likewise.
> * gcc.target/i386/pr117839-3b.c: Likewise.



-- 
BR,
Hongtao

Re: [PATCH v2] x86: Insert extra move for mode size smaller than natural size

Reply via email to