Mersenne: Re: fast GCD

EWMAYER Sat, 6 Nov 1999 13:43:55 -0800
George Woltman writes:

>Since F24 is 5 million digits your computer would take 5 minutes * 5^2,
>or 2 hours.  Given your machine is faster, my code is 32-bit, my code
>does an extended GCD, the Alpha is a better architecture, and your code
>is probably better - you come up with the factor of 12 difference.

Was your code doing a true Fermat-mod DWT, or were you attempting to
factor instead the Mersenne number M(2^25-1) = (2^(2^24)-1)*(2^(2^24)+1)?
In the latter case your GCD would involve vectors twice as long, which
would mean a factor-of-4 runlength increase for an O(n^2) algorithm.

>Now the good news.  Crandall's giants code has an implementation of
>the O (n * log n * log n) GCD in it.  It was painfully slow.  After two
>days of optimizing, I will be able to do that 5 million digit
>GCD in an estimated 30 minutes.

Impressive - can you give us a brief summary of the most time-impacting
changes you made, and were these to the C source or in ASM?

>BTW, I know of no other publicly available implemetation of the fast GCD.
>It was written by J. P. Buhler.

I know little about it, except that Crandall mentioned it took them
the better part of 6 months to get the thing written and debugged.
Better them than you, right? :)

Jason Papadopoulos writes:

>I've heard lots of complaints that IA64 has no fully-integer multiply
>(i.e. you have to load your numbers into an FPU register and pull the
>result back). Actually, there have been a lot of gripes about IA64;
>Intel's developers' guide is great reading but quite difficult at times.

Respectfully, I believe this is a point on which the purists may be plain
wrong. One thing that impresses me about the IA64 (at least judging from
the hardware guide) is the flexibility regarding loading integers into
either a 64-bit integer (a.k.a. general-purpose) register or into a
floating register mantissa - sure, the IA64 has no separate integer
multiply unit, but if it can quickly load two 64-bit ints into floating
registers and then use the floating multiply unit (which is fully pipe-
lined and very fast) to get either the low or high 64 bits of the 128-bit
(exact integer) product, then that's a real gain.

One thing about the above operation (specifically the upper 64-bit
calculation) that intrigues me is how the IA64 effects error correction
of the floating product. On an IEEE-compliant machine, one can use an
FMUL to get the upper 52 bits of an integer product of up to 52+64=116
bits in length, a standard 64-bit integer MUL to get the low 64 bits,
and one uses the remaining bit (the lowest-order bit of the 53-bit
floating mantissa) to compare to the corresponding bit in the 64-bit
integer product to effect error correction. Perhaps the IA-64 uses a 
65-th mantissa bit in the floating multiplier to do this?

Cheers,
-Ernst

ftp://209.133.33.182/pub/mayer/home.html

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
Mersenne: Re: fast GCD

Reply via email to