Hi, Try your skills at some bit twiddling !!

We have the existing MPIR macro 
modlimb_invert(i,x)     in gmp-impl.h
which calculates the Hensel(2-adic) inverse i of x 

ie)   x*i == 1 (mod  2^64) or whatever the limb size is

Now the existing macro starts out with a table lookup to get the first 8 
bits , which it then refines. Can we do better ?
Assuming the table is in the L1 Cache (how likely is this?) , then a cache 
lookup has a latency of 3cycles(AMD,INTEL) 4cycle(nehalem) and ?(others)

So currently
i=table[(x/2)%128]
 we need 5 cycles and a 128byte table , or 4 cycles and a 256byte table. If we 
are in a shared library rather than a static one , then we also need a PIC 
lookup ( which is at least 2? cycles)

Using some bit twiddling we can get

We can get 3bits in 0 cycles
i=x

4bits in 2 cycles
unsigned long c=0xF050309070D0B010;i= c>>((x<<2)%64);// %64 is free for shifts

5 bits in 3cycles
unsigned long c=0x2D282D282D282D28;i= x^((c>>(x%64))<<3);
or
unsigned int c=0x2D282D28;i= x^((c>>(x%32))<<3);

6 bits in 4 cycles
unsigned long c=0x1459945105598540;i= x^((c>>((x*2)%64))<<3);

6 bits in 7 cycles
i=x*(2-x*x);  // ie using i=i*(2-x*i) on our 3 bit inverse

8bits in 9 cycles
get 4 bits as above , and use i=i*(2-x*i);

Once we get to 8 bits then we can use the standard i=i*(2-x*i) quadratic 
iteration to get to 16,32,64 bits . We can save a one cycle by using a 
quartic iteration. 

Can you do any better?

I'm fairly sure 7bits can be done 5 or less cycles , so I expect it can be 
done( go parallel, although for x86_64 variable shifts can't be 
parallel(require CX reg for the count), mmx is possible but have to then 
reduce the count mod 64 

Have fun.

Jason


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to