Re: RFC: Simplify rules for ctz/clz patterns and RTL

Segher Boessenkool Wed, 15 Aug 2007 12:53:06 -0700

I suppose you're using (assuming 32-bit)
    ctz(x) := 31 - clz(x & -x)
now, which gives -1 for 0; and the version you're looking for is
    ctz(x) := 32 - clz(~x & (x-1))
which gives 32 for 0.
Thanks! That's, unfortunately, one more instruction, although I guessa lot of chips have "a & ~b" as one operation.


Yes, it's exactly the same cost on PowerPC, and on most other
RISC architectures.

It looks like ~x & (x-1) turns any number into 000...111... where theboundary between zeroes and ones lies at the lowest 1 in the original.


Exactly.  "To the right of the lowest 1".

Is popcount really slow on PowerPC? (Compared to clz?) Ideally onewould choose between the two expansions based on RTL costs, but theonly architectures it matters for are i386 and powerpc, and neitherof them define the cost of either clz or popcount.


Andrew answered this already.  Adding clz/popcount to the cost
tables seems like a good idea, yes.


Segher

Re: RFC: Simplify rules for ctz/clz patterns and RTL

Reply via email to