On Thu, May 14, 2026 at 11:29 PM liuhongt <[email protected]> wrote:
>
> When X is nonzero, X ^ (X - 1) produces a mask of trailing zeros plus
> the lowest set bit, so popcount of that expression equals ctz(X) + 1.
> Folding to CTZ avoids the blsmsk+popcnt (and a cmove for the zero
> case) sequence on targets with a direct CTZ.
>
> gcc/ChangeLog:
>
>         PR middle-end/124630
>         * match.pd (popcount (x ^ (x - 1))): Fold to ctz (x) + 1 when
>         x is nonzero and CTZ is directly supported.
>
> gcc/testsuite/ChangeLog:
>
>         PR middle-end/124630
>         * gcc.target/i386/pr124630.c: New test.
> ---
>  gcc/match.pd                             | 11 +++++++++++
>  gcc/testsuite/gcc.target/i386/pr124630.c | 12 ++++++++++++
>  2 files changed, 23 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr124630.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index b037b1a2876..10fba2cb788 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -10377,6 +10377,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>        (BUILT_IN_POPCOUNT (convert:type0 @0))
>        (if (cfn == CFN_BUILT_IN_POPCOUNTLL)
>         (BUILT_IN_POPCOUNTLL (convert:type0 @0))))))))
> +
> +/* popcount (X ^ (X - 1)) is CTZ (X) + 1 when X is nonzero.  */
> +(simplify
> +  (POPCOUNT (bit_xor:c tree_expr_nonzero_p@0
> +                     (plus @0 integer_minus_onep)))
> +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +       && direct_internal_fn_supported_p (IFN_CTZ, TREE_TYPE (@0),
> +                                         OPTIMIZE_FOR_SPEED))
> +   (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
> +    (plus (CTZ:type (convert:utype @0))
> +         { build_one_cst (type); }))))
>  #endif

I was going to suggest you place this pattern in the same area as the
`ffs(a) -> CTZ(a)+1` pattern but that itself needs to be moved around
it seems.
So I think this location is fine.

Can you format it the same as the FFS pattern though:
```
/* __builtin_ffs needs to deal on many targets with the possible zero
   argument.  If we know the argument is always non-zero, __builtin_ctz + 1
   should lead to better code.  */
(simplify
 (FFS tree_expr_nonzero_p@0)
 (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
      && direct_internal_fn_supported_p (IFN_CTZ, TREE_TYPE (@0),
                                         OPTIMIZE_FOR_SPEED))
  (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
   (plus (CTZ:type (convert:utype @0)) { build_one_cst (type); }))))
```

And add a comment saying:
`popcount (X ^ (X - 1))` is the same as ffs(x) when x is nonzero. and
using ctz+1 will generate better code.

>
>  /* PARITY simplifications.  */
> diff --git a/gcc/testsuite/gcc.target/i386/pr124630.c 
> b/gcc/testsuite/gcc.target/i386/pr124630.c
> new file mode 100644
> index 00000000000..440aede39fa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr124630.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mbmi" } */
> +/* { dg-final { scan-assembler-not "blsmsk" } } */
> +/* { dg-final { scan-assembler-not "cmove" } } */
> +
> +unsigned foo (unsigned a)
> +{
> +    if (a != 0)
> +     return __builtin_popcount (a ^ (a - 1)) - 1;
> +    else
> +     return 32;
> +}

Please add a generic testcase and not just a x86_64 specific one.
In this case you can use ctz target supports which checks if ctz will
cause a call or not.

Thanks,
Drea

> --
> 2.34.1
>

Reply via email to