https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #4)
> What's missing is middle-end folding support to narrow popcount to the
> appropriate internal function call with byte/half-word width when target
> support
> is available.  But I'm quite sure there's no scalar popcount instruction
> operating on half-word or byte pieces of a GPR?
> 
> Alternatively the vectorizer can use patterns to do this.

Yes, but for 64bit width, vectorizer generate suboptimal code.

sse #c3

  vector(2) long long unsigned int vect__4.6;
  vector(2) long long unsigned int vect__4.5;
  vector(2) long long unsigned int _8;
  vector(2) long long unsigned int _26;

  ...
  ...

  _8 = .POPCOUNT (vect__4.5_16);
  _26 = .POPCOUNT (vect__4.6_9);
  vect__5.7_22 = VEC_PACK_TRUNC_EXPR <_8, _26>; --- Why do we do this?
  vector(4) int vect__5.7;


It could generate directly

  v4di = .POPCOUNT (v4di);

Reply via email to