On 02/09/2018 05:42 AM, Roger Sayle wrote: > The following patch implements a number of __builtin_popcount related > optimizations. > (i) popcount(x) == 0 can be simplified to x==0, and popcount(x) != 0 to > x!=0. > (ii) popcount(x&1) can be simplified to x&1, and for unsigned x, > popcount(x>>31) to x>>31. > (iii) popcount (x&6) + popcount(y&16) can be simplified to > popcount((x&6)|(y&16)) > > These may seem obscure transformations, but performing these types of > POPCOUNT > operations are often the performance critical steps in some cheminformatics > applications. > > To implement the above transformations I've introduced the tree_nonzero_bits > function, > which is a tree-level version of rtlanal's nonzero_bits used by the RTL > optimizers. > > The following patch has been tested on x86_64-pc-linux-gnu with a "make > bootstrap" > and "make check" with no regressions, and passes for the four new gcc.dg > test cases. > > Many thanks In advance. Best regards, > > Roger > -- > Roger Sayle, PhD. > NextMove Software Limited > Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY > > 2018-02-09 Roger Sayle <ro...@nextmovesoftware.com> > > * fold-const.c (tree_nonzero_bits): New function. > * fold-const.h (tree_nonzero_bits): Likewise. > * match.pd (POPCOUNT): New patterns to fold BUILTIN_POPCOUNT and > friends. POPCOUNT(x&1) => x&1, POPCOUNT(x)==0 => x==0, etc. > > 2018-02-09 Roger Sayle <ro...@nextmovesoftware.com> > > * gcc.dg/fold-popcount-1.c: New testcase. > * gcc.dg/fold-popcount-2.c: New testcase. > * gcc.dg/fold-popcount-3.c: New testcase. > * gcc.dg/fold-popcount-4.c: New testcase. > > > > > Index: gcc/fold-const.c > =================================================================== > --- gcc/fold-const.c (revision 257227) > +++ gcc/fold-const.c (working copy) > @@ -14580,6 +14580,75 @@ > return string + offset; > } > > +/* Given a tree T, compute which bits in T may be nonzero. */ > + > +wide_int > +tree_nonzero_bits (const_tree t) > +{ > + switch (TREE_CODE (t)) > + { > + case BIT_IOR_EXPR: > + case BIT_XOR_EXPR: > + return wi::bit_or (tree_nonzero_bits (TREE_OPERAND (t, 0)), > + tree_nonzero_bits (TREE_OPERAND (t, 1))); Hmm. I think this will potentially have too many bits set in the BIT_XOR case. Is there some reason you didn't use wi::bit_xor for that case?
We can probably go ahead and ACK this once that question is resolved. THanks, jeff