double conversion should use half-width memory operands to avoid shuffles, instead of load+extract

pinskia at gcc dot gnu.org via Gcc-bugs Sun, 26 Sep 2021 19:45:57 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80570


Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |tree-optimization

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
  vect__4.5_24 = MEM <vector(8) int> [(int *)ip_12 + ivtmp.15_28 * 1];
  vect_tmp_14.6_23 = [vec_unpack_float_lo_expr] vect__4.5_24;
  vect_tmp_14.6_22 = [vec_unpack_float_hi_expr] vect__4.5_24;
  MEM <vector(4) double> [(double *)dp_10 + ivtmp.15_28 * 2] =
vect_tmp_14.6_23;
  MEM <vector(4) double> [(double *)dp_10 + 32B + ivtmp.15_28 * 2] =
vect_tmp_14.6_22;

Even on aarch64:

.L2:
        ldr     q0, [x1], 16
        sxtl    v1.2d, v0.2s
        sxtl2   v0.2d, v0.4s
        scvtf   v1.2d, v1.2d
        scvtf   v0.2d, v0.2d
        stp     q1, q0, [x0]
        add     x0, x0, 32
        cmp     x2, x1
        bne     .L2

But the above is decent really.

[Bug tree-optimization/80570] auto-vectorizing int->double conversion should use half-width memory operands to avoid shuffles, instead of load+extract

Reply via email to