https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122277

--- Comment #5 from Robin Dapp <rdapp at gcc dot gnu.org> ---
> but now we do, and single-lane RVVM1SI looks better from your
> cost model.  I believe that before my patch we never tried this
> because there's a conversion around the reduction and

Having another quick look, I think the V4QI/VLS loop code is worse.
Its costs are lower but for VF=1 instead of VF=4, so of course the
VLA loop is preferred.

When forcing the vector mode (--param=riscv-autovec-mode=V4QI) I can see
several broadcast "permutes" like:

  vect__59.11_427 = {_428, 0, 0, 0};
  vect__59.12_425 = VEC_PERM_EXPR <vect__59.11_427, vect__59.11_427, { 0, 0, 0,
0 }>;

We don't handle those in the perm_const hook yet.  So costing is surely not
ideal, but rather in the wrong direction ;)

And, indeed, no unrolling but just a single iteration.  Unrolling then happens
later, explicitly.

Also, a number of other permutes that weren't there before.

Need to have a closer look tomorrow.

Reply via email to