On 05/01/2018 02:48 AM, Marc Glisse wrote: > On Mon, 30 Apr 2018, Jeff Law wrote: > >> On 02/09/2018 05:42 AM, Roger Sayle wrote: >>> The following patch implements a number of __builtin_popcount related >>> optimizations. >>> (i) popcount(x) == 0 can be simplified to x==0, and popcount(x) != 0 to >>> x!=0. >>> (ii) popcount(x&1) can be simplified to x&1, and for unsigned x, >>> popcount(x>>31) to x>>31. >>> (iii) popcount (x&6) + popcount(y&16) can be simplified to >>> popcount((x&6)|(y&16)) >>> >>> These may seem obscure transformations, but performing these types of >>> POPCOUNT >>> operations are often the performance critical steps in some >>> cheminformatics >>> applications. >>> >>> To implement the above transformations I've introduced the >>> tree_nonzero_bits >>> function, >>> which is a tree-level version of rtlanal's nonzero_bits used by the RTL >>> optimizers. >>> >>> The following patch has been tested on x86_64-pc-linux-gnu with a "make >>> bootstrap" >>> and "make check" with no regressions, and passes for the four new gcc.dg >>> test cases. >>> >>> Many thanks In advance. Best regards, >>> >>> Roger >>> -- >>> Roger Sayle, PhD. >>> NextMove Software Limited >>> Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY >>> >>> 2018-02-09 Roger Sayle <ro...@nextmovesoftware.com> >>> >>> * fold-const.c (tree_nonzero_bits): New function. >>> * fold-const.h (tree_nonzero_bits): Likewise. >>> * match.pd (POPCOUNT): New patterns to fold BUILTIN_POPCOUNT and >>> friends. POPCOUNT(x&1) => x&1, POPCOUNT(x)==0 => x==0, etc. >>> >>> 2018-02-09 Roger Sayle <ro...@nextmovesoftware.com> >>> >>> * gcc.dg/fold-popcount-1.c: New testcase. >>> * gcc.dg/fold-popcount-2.c: New testcase. >>> * gcc.dg/fold-popcount-3.c: New testcase. >>> * gcc.dg/fold-popcount-4.c: New testcase. >>> >>> >>> >>> >>> Index: gcc/fold-const.c >>> =================================================================== >>> --- gcc/fold-const.c (revision 257227) >>> +++ gcc/fold-const.c (working copy) >>> @@ -14580,6 +14580,75 @@ >>> return string + offset; >>> } >>> >>> +/* Given a tree T, compute which bits in T may be nonzero. */ >>> + >>> +wide_int >>> +tree_nonzero_bits (const_tree t) >>> +{ >>> + switch (TREE_CODE (t)) >>> + { >>> + case BIT_IOR_EXPR: >>> + case BIT_XOR_EXPR: >>> + return wi::bit_or (tree_nonzero_bits (TREE_OPERAND (t, 0)), >>> + tree_nonzero_bits (TREE_OPERAND (t, 1))); >> Hmm. I think this will potentially have too many bits set in the >> BIT_XOR case. Is there some reason you didn't use wi::bit_xor for that >> case? > > You cannot use bit_xor because the bits are only *possibly* set (same as > you can't say anything about BIT_NOT_EXPR). We would need to also track > the bits *certainly* set to start doing clever stuff, and for > BIT_XOR_EXPR that would still be inconvenient. Yea, I realized it was a may vs must issue after I signed off for the night.
jeff