Just for fun there's a good 64-bit reverse hack in Hank Warrens excellent "Hackers Delight" book based on Knuth's algorithm. Of course, native instructions should be faster.

Charles, I've sent you a couple of off list mails. Looks like I'm not getting past your spam filter.

/* How would you extend Knuth's algorithm to rotating a 64-bit quantity
on a 64-bit machine?
   A simple method is to first swap the two halves of the 64-bit word,
which takes one instruction if you have the rotate shift. Then, rotate
left 15 positions the two halves of the 64-bit word. This takes five
operations:

   x = (x & 0x0001FFFF0001FFFF) << 15 | (x & 0xFFFE0000FFFE0000) >> 17;

Then do the 10-swap, 4-swap, and 2-swap as in Knuth's code but with each
mask m doubled to the form mm. Thus this method adds five operations to
Knuth's code, making it 24 operations for the "exclusive or" version
(rev14a), if you have the rotate shift, or "swap register halves."
   Another way is to develop code similar to Knuth's that first rotates
left 31 positions, and then does a 20-swap, an 8-swap, a 4-swap, and a
2-swap. This adds one stage of swapping, which takes seven operations as
in rev14, or 6 operations as in rev14a. Thus the simple method seems to
be best, in terms of operation count (but not if parallelism in rev14
can be used to advantage).
   First, the simple method. 24 operations if the swap (rotate 32)
counts as 1. */

unsigned long long rev15(unsigned long long x) {
   unsigned long long t;

   x = (x << 32) | (x >> 32);   // Swap register halves.
   x = (x & 0x0001FFFF0001FFFFLL) << 15 | // Rotate left
       (x & 0xFFFE0000FFFE0000LL) >> 17;  // 15.
   t = (x ^ (x >> 10)) & 0x003F801F003F801FLL;
   x = (t | (t << 10)) ^ x;
   t = (x ^ (x >> 4)) & 0x0E0384210E038421LL;
   x = (t | (t << 4)) ^ x;
   t = (x ^ (x >> 2)) & 0x2248884222488842LL;
   x = (t | (t << 2)) ^ x;
   return x;
}

On 24/07/2013 8:30 PM, Charles Mills wrote:
Thanks all.

You're right, "just how fast DOES this code need to be?" And the answer is I
should know, but I don't. I don't want to waste the customer's cycles. I am
smart enough to know that I am too dumb to know how fast it needs to be. The
right answer lies in profiling, and some other task has always been just a
little higher priority than profiling.

Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I
hated it. I am very weak on the higher math relevant to programming.

Charles

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On
Behalf Of Andrew Rowley
Sent: Wednesday, July 24, 2013 8:17 AM
To: [email protected]
Subject: Re: Is there a "reverse bits" hardware instruction?

How fast does this code need to be? David's ffs64 looked pretty good to my
inexpert eye, I think you would have to be running it very frequently for
something to be measurably faster.

There are some similar discussions here, including some branchless
techniques that probably would be faster (not necessarily detectably):
http://stackoverflow.com/questions/757059/position-of-least-significant-bit-
that-is-set

One answer also talks about clearing the lowest set bit.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to