https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121959

            Bug ID: 121959
           Summary: riscv: vector sign extend instead of zero extend
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rdapp at gcc dot gnu.org
                CC: pan2.li at intel dot com
  Target Milestone: ---
            Target: riscv

I haven't analyzed this in detail yet but figured I'll open a PR for tracking
purposes.

The following example, extracted from x264's satd, compiled with -O3
-march=rv64gcv

void
lul( int *restrict res, uint8_t *restrict a, uint8_t *restrict b, int n)
{
  for (int i = 0; i < n; i++)
    {
      res[i] = (a[i] - b[i]) << 16;
    }
}

results in

.L3:
        vsetvli a5,a3,e8,mf4,ta,ma
        vle8.v  v1,0(a2)
        vle8.v  v3,0(a1)
        slli    a4,a5,2
        sub     a3,a3,a5
        add     a1,a1,a5
        add     a2,a2,a5
        vwsubu.vv       v2,v3,v1
        vsetvli zero,zero,e32,m1,ta,ma
        vsext.vf2       v1,v2
        vsll.vi v1,v1,16
        vse32.v v1,0(a0)
        add     a0,a0,a4
        bne     a3,zero,.L3

which is reasonable.  LLVM, however produces:

        ...
        vzext.vf2       v8, v10
        vsll.vi v8, v8, 16

which can be combined into vwsll (vector widening shift left).
vwsll zero-extends so we cannot combine a sign-extend + left shift.
Left-shifting a negative number is undefined but I'm not sure we can
make use of that here.

.optimized:

  vect__3.8_93 = .MASK_LEN_LOAD (vectp_a.6_90, 8B, { -1, ... }, _92(D), _112,
0);
  vect_patt_31.9_94 = (vector([4,4]) unsigned short) vect__3.8_93;
  vect__6.12_99 = .MASK_LEN_LOAD (vectp_b.10_96, 8B, { -1, ... }, _98(D), _112,
0);
  vect_patt_29.13_100 = (vector([4,4]) unsigned short) vect__6.12_99;
  vect_patt_27.14_101 = vect_patt_31.9_94 - vect_patt_29.13_100;
  vect_patt_26.15_102 = VIEW_CONVERT_EXPR<vector([4,4]) signed
short>(vect_patt_27.14_101);
  vect_patt_22.16_103 = (vector([4,4]) int) vect_patt_26.15_102;
  vect__11.17_104 = vect_patt_22.16_103 << 16;

Reply via email to