On Tue, May 19, 2026 at 11:42 AM Andrew Pinski < [email protected]> wrote:
> On Tue, May 19, 2026 at 11:27 AM Pengxuan Zheng > <[email protected]> wrote: > > > > > > > > On Mon, May 18, 2026 at 3:32 PM Andrew Pinski < > [email protected]> wrote: > >> > >> On Mon, May 18, 2026 at 10:47 AM Pengxuan Zheng > >> <[email protected]> wrote: > >> > > >> > This enables the vectorizer to vectorize conversion from long to > float for > >> > aarch64 target. > >> > > >> > Bootstrapped and tested on aarch64_linux_gnu. > >> > > >> > PR target/123748 > >> > > >> > gcc/ChangeLog: > >> > > >> > * config/aarch64/aarch64-simd.md (vec_packs_float_v2di): New > pattern. > >> > > >> > gcc/testsuite/ChangeLog: > >> > > >> > * gcc.target/aarch64/pr123748.c: New test. > >> > > >> > Signed-off-by: Pengxuan Zheng <[email protected]> > >> > --- > >> > gcc/config/aarch64/aarch64-simd.md | 26 > +++++++++++++++++++++ > >> > gcc/testsuite/gcc.target/aarch64/pr123748.c | 13 +++++++++++ > >> > 2 files changed, 39 insertions(+) > >> > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr123748.c > >> > > >> > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > >> > index 2e142b1e1ee..21f13564280 100644 > >> > --- a/gcc/config/aarch64/aarch64-simd.md > >> > +++ b/gcc/config/aarch64/aarch64-simd.md > >> > @@ -3206,6 +3206,32 @@ (define_insn > "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_HSDI:mode>3" > >> > [(set_attr "type" "neon_int_to_fp_<VDQ_HSDI:stype><q>")] > >> > ) > >> > > >> > +(define_expand "vec_packs_float_v2di" > >> > + [(set (match_operand:V4SF 0 "register_operand" "=w") > >> > + (vec_concat:V4SF > >> > + (float:V2SF (match_operand:V2DI 1 "register_operand" "w")) > >> > + (float:V2SF (match_operand:V2DI 2 "register_operand" > "w"))))] > >> > + "TARGET_SIMD && flag_unsafe_math_optimizations" > >> > + { > >> > + rtx tmp = gen_reg_rtx (V2DFmode); > >> > + rtx tmp1 = gen_reg_rtx (V2DFmode); > >> > + rtx tmp2 = gen_reg_rtx (V2SFmode); > >> > + rtx tmp3 = gen_reg_rtx (V2SFmode); > >> > + emit_insn (gen_floatv2div2df2 (tmp, operands[1])); > >> > + emit_insn (gen_floatv2div2df2 (tmp1, operands[2])); > >> > + emit_insn (gen_truncv2dfv2sf2 (tmp2, tmp)); > >> > + emit_insn (gen_truncv2dfv2sf2 (tmp3, tmp1)); > >> > >> Can you add a comment in front of this which describes what is doing. > >> Something like: > >> /* V4DI -> V4SF is done as 2*V2DI->2*V2DF->2*V2SF and then combined > >> together to form V4SF. */ > >> /* Since there is an extra rounding step, unsafe math optimization > >> needs to be on. */ > >> > >> Otherwise ok. > > > > > > Thanks, Andrea. I've added the comment you suggested and > > pushed the change as r17-606-gf2d10af76. > > I missed something during the review; it is just a testcase issue. > Can you change `long` to be `long long` so the testcase works on > aarch64 targets where `long` is 32bits (e.g. mingw). > Sure, I've updated the testcase and committed as obvious ( r17-612-gc99177366b). Thanks, Pengxuan > > Thanks, > Andrew > > > > > Thanks, > > Pengxuan > >> > >> > >> Thanks, > >> Andrea > >> > >> > >> > + if (BYTES_BIG_ENDIAN) > >> > + std::swap (tmp2, tmp3); > >> > + > >> > + rtx tmp4 = gen_reg_rtx (V2DImode); > >> > + emit_insn (gen_aarch64_zip1v2di_low (tmp4, gen_lowpart (DImode, > tmp2), > >> > + gen_lowpart (DImode, tmp3))); > >> > + emit_move_insn (operands[0], gen_lowpart (V4SFmode, tmp4)); > >> > + DONE; > >> > + } > >> > +) > >> > + > >> > ;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] > patterns > >> > ;; is inconsistent with vector ordering elsewhere in the compiler, > in that > >> > ;; the meaning of HI and LO changes depending on the target > endianness. > >> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr123748.c > b/gcc/testsuite/gcc.target/aarch64/pr123748.c > >> > new file mode 100644 > >> > index 00000000000..8ba290cf12d > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.target/aarch64/pr123748.c > >> > @@ -0,0 +1,13 @@ > >> > +/* { dg-do compile } */ > >> > +/* { dg-options "-Ofast" } */ > >> > + > >> > +void > >> > +f (float *__restrict f, long *__restrict l) > >> > +{ > >> > + for (int i = 0; i < 128; i++) > >> > + f[i] = l[i]; > >> > +} > >> > + > >> > +/* { dg-final { scan-assembler-times {scvtf\t} 2 } } */ > >> > +/* { dg-final { scan-assembler-times {fcvtn\t} 2 } } */ > >> > +/* { dg-final { scan-assembler-times {zip1\t} 1 } } */ > >> > -- > >> > 2.34.1 > >> > >
