Hi Tim, in fact I was trying the OR-alternative -- however, it's only a win on older AMD Opterons (16 cycles vs. 20), but cannot beat the __builtin_clz alternative on Intel.
Best regards, Rainer On Wednesday 12 October 2011 11:26:52 Tim Mattox wrote: > All, > If you wanted to speedup these routines for processors without > __builtin_clz, there are a variety of variations in C to implement clz > efficiently. See Hacker's Delight nlz (number of leading zeros): > http://www.hackersdelight.org/HDcode/nlz.c.txt > > Or from my Ph.D. advisor's magic algorithm's page: > http://aggregate.org/MAGIC/#Leading%20Zero%20Count > > And you can directly implement opal_next_poweroftwo() > with this: > http://aggregate.org/MAGIC/#Next%20Largest%20Power%20of%202 > > The Hacker's Delight webpage (and book) are fun to read for that > certain kind of person. :-) > http://www.hackersdelight.org/