https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|target |tree-optimization --- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> --- So we vectorize to _18 = .POPCOUNT (vect__5.7_22); _17 = .POPCOUNT (vect__5.7_21); vect__6.8_16 = VEC_PACK_TRUNC_EXPR <_18, _17>; _6 = 0; _7 = dest_13(D) + _2; vect__8.9_10 = [vec_unpack_lo_expr] vect__6.8_16; vect__8.9_9 = [vec_unpack_hi_expr] vect__6.8_16; _8 = (long long int) _6; which is exactly the issue that in the scalar code we have a 'int' producing popcount with long long argument but the vector IFN produces a result of the same width as the argument. So the vectorizer compensates for that (VEC_PACK_TRUNC_EXPR) and then vectorizes the widening that's in the scalar code (vec_unpack_{lo,hi}_expr). The fix for this and for the missing byte and word variants is to add a pattern to tree-vect-patterns.c for this case matching it to the .POPCOUNT internal function. That possibly applies to other bitops, too, like parity, ctz, ffs, etc. There's quite some _widen helpers in the pattern recog code so I'm not sure how complicated it is to match (long)popcountl(long) and (short)popcount((int)short) Richard may have a good idea since he did the last "big" surgery there.