Re: A few more iterations? (was Re: Mersenne: Moo?)

Peter-Lawrence . Montgomery Sun, 7 Mar 1999 13:06:53 -0500
[EMAIL PROTECTED] suggests

> Suppose we take a number of exponents that all need checking (or double-checking) 
> - say 3, which I'll call p, q and r. (The number doesn't matter to the theory). 
> Now use the standard LL test algorithm to calculate the residues for each 
> exponent, working modulo M(p)*M(q)*M(r), taking note of the values of the residues 
> (the WHOLE value, not just the low n bits) S(p) at iteration p-2, S(q) at 
> iteration q-2 and S(r) at iteration r-2. In other words, do the "few extra 
> iterations" suggested!
> 
> If we know S(p) mod (k*M(p)) for any integer k, then obtaining S(p) mod M(p) is 
> essentially trivial - one operation, and M(p) being just a string of 1 bits, it's 
> very easy to do with a few (extended precision) shift & add operations.
> 
> The problem is that, superficially, this procedure doesn't seem to actually save 
> any time - but it shouldn't take significantly longer than running the three 
> independent tests. Practical implementation using DWT / FFT / NTT would obviously 
> require a different transform size than would be used (optimally) to do the three 
> tests independently. Also the data worked on would be different enough to ward off 
> risks of data-dependent processor bugs.
> 
> In practice, there may be some small time saving possible. It turns out that the 
> fast multiplication algorithms are most efficiently implemented when the transform 
> size is a power of 2. So, e.g., numbers around 3 million are relatively expensive 
> to check (using Prime95) because they're too large for a 128K transform size but 
> too small to need a 256K transform size. However, checking 3 at once would fit a 
> 512K transform size reasonably well. There is, of course, a converse trend, in so 
> far as larger transform sizes tend to cause more cache misses, therefore cache 
> tuning for large transform size may become a significant issue.
> 
> My suggested "trick" applies especially to other programs e.g. MacLucasUNIX which 
> have no intermediate transform sizes between powers-of-two. A couple of days ago 
> my Alpha system finished double-checking M(2361701) using a 128K transform; the 
> next exponent I had lined up for it was M(2361721), and it's decided that a 256K 
> transform is needed, thus doubling the run time 8-( (Don't forget this is 
> DIFFERENT CODE to Prime95 - in fact, the Alpha FPU works to only 53 bits precision 
> (IEEE G-floating), rather than the 64 bits precision which the Intel x86 FPU is 
> capable of)
> 
> Can anyone fault my logic? If not, then my suggested approach may possibly be more 
> effective than writing extra code into MacLucasUNIX (and other LL testers) to cope 
> with transform sizes intermediate between powers of 2. Also saves debugging a 
> whole load of extra FFT code 8-)
> 
> Regards
> Brian Beesley

        Try to keep your lines below 75 characters so they still fit
on an 80-character screen after we copy and indent them.

        It is valid to compute a residue modulo M(p) * M(q) * M(r)
and later reduce it modulo M(p), M(q), M(r).  But how will the DWT
(Discrete Weighted Transform) modulo this product be done?
Presently, if we want a product modulo 2^p - 1 using
a convolution of length n, we scale the input using powers
of 2^(1/n), and the precise powers depend upon p.
The individual digits of the multiple-precision residue
do not all have the same base -- the pattern depends upon p.
How to you resolve these?

      Peter


________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Re: A few more iterations? (was Re: Mersenne: Moo?)

Reply via email to