https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |tree-optimization

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
So we vectorize to

  _18 = .POPCOUNT (vect__5.7_22);
  _17 = .POPCOUNT (vect__5.7_21);
  vect__6.8_16 = VEC_PACK_TRUNC_EXPR <_18, _17>;
  _6 = 0;
  _7 = dest_13(D) + _2;
  vect__8.9_10 = [vec_unpack_lo_expr] vect__6.8_16;
  vect__8.9_9 = [vec_unpack_hi_expr] vect__6.8_16;
  _8 = (long long int) _6;

which is exactly the issue that in the scalar code we have a 'int' producing
popcount with long long argument but the vector IFN produces a result of the
same width as the argument.  So the vectorizer compensates for that
(VEC_PACK_TRUNC_EXPR) and then vectorizes the widening that's in the scalar
code (vec_unpack_{lo,hi}_expr).  The fix for this and for the missing
byte and word variants is to add a pattern to tree-vect-patterns.c for this
case matching it to the .POPCOUNT internal function.  That possibly applies
to other bitops, too, like parity, ctz, ffs, etc.  There's quite some
_widen helpers in the pattern recog code so I'm not sure how complicated
it is to match

 (long)popcountl(long)

and

 (short)popcount((int)short)

Richard may have a good idea since he did the last "big" surgery there.

Reply via email to