On Wed, Dec 17, 2014 at 04:08:40PM +0000, Trollgeir via Digitalmars-d-learn wrote: > On Wednesday, 17 December 2014 at 14:58:13 UTC, Adam D. Ruppe wrote: > >On Wednesday, 17 December 2014 at 14:12:16 UTC, Trollgeir wrote: > >>I'd expect the bt function to be up to 32 times faster as I thought > >>it only compared two bits, and not the entire length of bits in the > >>uint. > > > >The processor doesn't work in terms of bits like that - it still > >needs to look at the whole integer. In fact, according to my (OLD) > >asm reference, the bt instruction is slower than the and instruction > >at the cpu level. > > > >I think it has to do a wee bit more work, translating the 16 into a > >mask then moving the result into the flag... then moving the flag > >back into a register to return the value. (That last step could > >probably be skipped if you do an if() on it and the compiler > >optimizes the branch, and the first step might be skipped too if it > >is a constant, since the compiler can rewrite the instruction. So > >given that, I'd expect what you saw: no difference when they are > >optimized to the same thing or when the CPU's stars align right, and > >& a bit faster when bt isn't optimized) > > > >bt() and friends are special instructions for specialized use cases. > >Probably useful for threading and stuff. > > > Thanks for the explanation, I suspected it might work something like > that. > > For my implementation - I have bits shifting to the right every > update, and I want to check if it has reached certain markers. Hence, > I felt it was really inefficient to check every single bit in the uint > when I was only interested in some specific ones. Is an alternative > (optimized) version of bt even possible?
That's an inaccurate understanding of how the CPU works. It does not check the bits "one by one"; the & operation translates to a single asm instruction that performs the bitwise-and of *all* bits in parallel. There is no faster implementation. What causes the slowdown is probably the conversion of the bit index into a mask. You could alleviate this part by precomputing the mask (if you're testing for bits in the same position in many values -- this will pay for the cost of computing the mask once, and reusing it many times thereafter, instead of computing it every single time). T -- Computers shouldn't beep through the keyhole.