On Tue, May 19, 2026 at 11:42 AM Andrew Pinski <
[email protected]> wrote:

> On Tue, May 19, 2026 at 11:27 AM Pengxuan Zheng
> <[email protected]> wrote:
> >
> >
> >
> > On Mon, May 18, 2026 at 3:32 PM Andrew Pinski <
> [email protected]> wrote:
> >>
> >> On Mon, May 18, 2026 at 10:47 AM Pengxuan Zheng
> >> <[email protected]> wrote:
> >> >
> >> > This enables the vectorizer to vectorize conversion from long to
> float for
> >> > aarch64 target.
> >> >
> >> > Bootstrapped and tested on aarch64_linux_gnu.
> >> >
> >> >         PR target/123748
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >         * config/aarch64/aarch64-simd.md (vec_packs_float_v2di): New
> pattern.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> >         * gcc.target/aarch64/pr123748.c: New test.
> >> >
> >> > Signed-off-by: Pengxuan Zheng <[email protected]>
> >> > ---
> >> >  gcc/config/aarch64/aarch64-simd.md          | 26
> +++++++++++++++++++++
> >> >  gcc/testsuite/gcc.target/aarch64/pr123748.c | 13 +++++++++++
> >> >  2 files changed, 39 insertions(+)
> >> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr123748.c
> >> >
> >> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> >> > index 2e142b1e1ee..21f13564280 100644
> >> > --- a/gcc/config/aarch64/aarch64-simd.md
> >> > +++ b/gcc/config/aarch64/aarch64-simd.md
> >> > @@ -3206,6 +3206,32 @@ (define_insn
> "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_HSDI:mode>3"
> >> >    [(set_attr "type" "neon_int_to_fp_<VDQ_HSDI:stype><q>")]
> >> >  )
> >> >
> >> > +(define_expand "vec_packs_float_v2di"
> >> > +  [(set (match_operand:V4SF 0 "register_operand" "=w")
> >> > +       (vec_concat:V4SF
> >> > +         (float:V2SF (match_operand:V2DI 1 "register_operand" "w"))
> >> > +         (float:V2SF (match_operand:V2DI 2 "register_operand"
> "w"))))]
> >> > +  "TARGET_SIMD && flag_unsafe_math_optimizations"
> >> > +  {
> >> > +    rtx tmp = gen_reg_rtx (V2DFmode);
> >> > +    rtx tmp1 = gen_reg_rtx (V2DFmode);
> >> > +    rtx tmp2 = gen_reg_rtx (V2SFmode);
> >> > +    rtx tmp3 = gen_reg_rtx (V2SFmode);
> >> > +    emit_insn (gen_floatv2div2df2 (tmp, operands[1]));
> >> > +    emit_insn (gen_floatv2div2df2 (tmp1, operands[2]));
> >> > +    emit_insn (gen_truncv2dfv2sf2 (tmp2, tmp));
> >> > +    emit_insn (gen_truncv2dfv2sf2 (tmp3, tmp1));
> >>
> >> Can you add a comment in front of this which describes what is doing.
> >> Something like:
> >> /* V4DI -> V4SF is done as 2*V2DI->2*V2DF->2*V2SF and then combined
> >> together to form V4SF.  */
> >> /* Since there is an extra rounding step, unsafe math optimization
> >> needs to be on.  */
> >>
> >> Otherwise ok.
> >
> >
> > Thanks, Andrea. I've added the comment you suggested and
> > pushed the change as r17-606-gf2d10af76.
>
> I missed something during the review; it is just a testcase issue.
> Can you change `long` to be `long long` so the testcase works on
> aarch64 targets where `long` is 32bits (e.g. mingw).
>

Sure, I've updated the testcase and committed as obvious (
r17-612-gc99177366b).

Thanks,
Pengxuan

>
> Thanks,
> Andrew
>
> >
> > Thanks,
> > Pengxuan
> >>
> >>
> >> Thanks,
> >> Andrea
> >>
> >>
> >> > +    if (BYTES_BIG_ENDIAN)
> >> > +      std::swap (tmp2, tmp3);
> >> > +
> >> > +    rtx tmp4 = gen_reg_rtx (V2DImode);
> >> > +    emit_insn (gen_aarch64_zip1v2di_low (tmp4, gen_lowpart (DImode,
> tmp2),
> >> > +                                        gen_lowpart (DImode, tmp3)));
> >> > +    emit_move_insn (operands[0], gen_lowpart (V4SFmode, tmp4));
> >> > +    DONE;
> >> > +  }
> >> > +)
> >> > +
> >> >  ;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi]
> patterns
> >> >  ;; is inconsistent with vector ordering elsewhere in the compiler,
> in that
> >> >  ;; the meaning of HI and LO changes depending on the target
> endianness.
> >> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr123748.c
> b/gcc/testsuite/gcc.target/aarch64/pr123748.c
> >> > new file mode 100644
> >> > index 00000000000..8ba290cf12d
> >> > --- /dev/null
> >> > +++ b/gcc/testsuite/gcc.target/aarch64/pr123748.c
> >> > @@ -0,0 +1,13 @@
> >> > +/* { dg-do compile } */
> >> > +/* { dg-options "-Ofast" } */
> >> > +
> >> > +void
> >> > +f (float *__restrict f, long *__restrict l)
> >> > +{
> >> > +  for (int i = 0; i < 128; i++)
> >> > +    f[i] = l[i];
> >> > +}
> >> > +
> >> > +/* { dg-final { scan-assembler-times {scvtf\t} 2 } } */
> >> > +/* { dg-final { scan-assembler-times {fcvtn\t} 2 } } */
> >> > +/* { dg-final { scan-assembler-times {zip1\t} 1 } } */
> >> > --
> >> > 2.34.1
> >> >
>

Reply via email to