On Wed, May 7, 2025 at 9:06 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Tue, May 6, 2025 at 3:35 PM Hongtao Liu <crazy...@gmail.com> wrote: > > > > On Tue, May 6, 2025 at 3:06 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao <hongtao....@intel.com> wrote: > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: H.J. Lu <hjl.to...@gmail.com> > > > > > Sent: Tuesday, May 6, 2025 2:16 PM > > > > > To: Liu, Hongtao <hongtao....@intel.com> > > > > > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Uros Bizjak > > > > > <ubiz...@gmail.com> > > > > > Subject: Re: [PATCH] x86: Skip if the mode size is smaller than its > > > > > natural size > > > > > > > > > > On Tue, May 6, 2025 at 10:54 AM Liu, Hongtao <hongtao....@intel.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: H.J. Lu <hjl.to...@gmail.com> > > > > > > > Sent: Thursday, May 1, 2025 6:39 AM > > > > > > > To: GCC Patches <gcc-patches@gcc.gnu.org>; Uros Bizjak > > > > > > > <ubiz...@gmail.com>; Liu, Hongtao <hongtao....@intel.com> > > > > > > > Subject: [PATCH] x86: Skip if the mode size is smaller than its > > > > > > > natural size > > > > > > > > > > > > > > When generating a SUBREG from V16QI to V2HF, validate_subreg fails > > > > > > > since the V2HF size (4 bytes) is smaller than its natural size > > > > > > > (word size). > > > > > > > Update remove_redundant_vector_load to skip if the mode size is > > > > > > > smaller than its natural size. > > > > > > I think we can also handle it in replace_vector_const by inserting > > > > > > an > > > > > > extra move with (Set (reg:v4qi) (subreg:v4qi (v16qi const0_rtx) 0)) > > > > > > And then use subreg with same vector size (v2hf<->v4qi) (set > > > > > > (reg:v2hf) (subreg:v2hf (reg:v4qi) 0)) > > > > > > > > > > What is the advantage of this approach? My patch uses a single > > > > > instruction to > > > > > write 4 bytes of 0s and 1s. Your suggestion needs at least one more > > > > > instruction. > > > > I'm not asking to do it for all the cases, just to handle those cases > > > > with invalid subreg > > > > > > > > @@ -3334,8 +3334,11 @@ replace_vector_const (machine_mode vector_mode, > > > > rtx vector_const, > > > > machine_mode mode = GET_MODE (dest); > > > > > > > > rtx replace; > > > > + if (!validate_subreg (mode, vector_mode, vector_const, 0)) > > > > + /* Insert an extra move to avoid invalid subreg. */ > > > > + ......... > > > > /* Replace the source operand with VECTOR_CONST. */ > > > > - if (SUBREG_P (dest) || mode == vector_mode) > > > > + else if (SUBREG_P (dest) || mode == vector_mode) > > > > replace = vector_const; > > > > else > > > > replace = gen_rtx_SUBREG (mode, vector_const, 0); > > > > > > > > For valid subreg, no need for extra instruction. > > > > I think RA can eliminate the extra move, then the optimization is not > > > > limited to "the mode size is smaller than its natural size". > > > > > > The only "the mode size is smaller than its natural size" cases are 4 > > > bytes (64-bit mode) > > > or 2 bytes (32-bit mode). In both cases, there is no need for vector > > > write with subreg. > > It depends on the use, if the use must be put into the vector > > register, .i.e below testcase, I assume your patch will restore the > > extra pxor? > > > > typedef char v4qi __attribute__((vector_size(4))); > > typedef char v16qi __attribute__((vector_size(16))); > > > > > > v4qi a; > > v16qi b; > > void > > foo (v4qi* c, v16qi* d) > > { > > v4qi sum = __extension__(v4qi){0, 0, 0, 0}; > > v16qi sum2 = __extension__(v16qi){0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0}; > > for (int i = 0; i != 100; i++) > > sum += c[i]; > > for (int i = 0 ; i != 100; i++) > > sum2 += d[i]; > > a = sum; > > b = sum2; > > } > > > > And since this patch is to solve the issue of invalid subreg, why > > don't we just use validate_subreg to guard in > > remove_redundant_vector_load. > > > > > Fixed in the v2 patch with 2 new tests. >
/* NB: Don't run recog_memoized here since vector SUBREG may not be valid. Let LRA handle vector SUBREG. */ I guess this comment can be removed? Otherwise LGTM. > Thanks. > > -- > H.J. > --- > When generating a SUBREG from V16QI to V2HF, validate_subreg fails since > V2HF is a floating point vector and its size (4 bytes) is smaller than its > natural size (word size). Insert an extra move with a QI vector SUBREG of > the same size to avoid validate_subreg failure. > > gcc/ > > PR target/120036 > * config/i386/i386-features.cc (ix86_get_vector_load_mode): > Handle 8/4/2 bytes. > (remove_redundant_vector_load): If the mode size is smaller than > its natural size, first insert an extra move with a QI vector > SUBREG of the same size to avoid validate_subreg failure. > > gcc/testsuite/ > > PR target/120036 > * g++.target/i386/pr120036.C: New test. > * gcc.target/i386/pr117839-3a.c: Likewise. > * gcc.target/i386/pr117839-3b.c: Likewise. -- BR, Hongtao