[EMAIL PROTECTED] suggests
> Suppose we take a number of exponents that all need checking (or double-checking)
> - say 3, which I'll call p, q and r. (The number doesn't matter to the theory).
> Now use the standard LL test algorithm to calculate the residues for each
> exponent, working modulo M(p)*M(q)*M(r), taking note of the values of the residues
> (the WHOLE value, not just the low n bits) S(p) at iteration p-2, S(q) at
> iteration q-2 and S(r) at iteration r-2. In other words, do the "few extra
> iterations" suggested!
>
> If we know S(p) mod (k*M(p)) for any integer k, then obtaining S(p) mod M(p) is
> essentially trivial - one operation, and M(p) being just a string of 1 bits, it's
> very easy to do with a few (extended precision) shift & add operations.
>
> The problem is that, superficially, this procedure doesn't seem to actually save
> any time - but it shouldn't take significantly longer than running the three
> independent tests. Practical implementation using DWT / FFT / NTT would obviously
> require a different transform size than would be used (optimally) to do the three
> tests independently. Also the data worked on would be different enough to ward off
> risks of data-dependent processor bugs.
>
> In practice, there may be some small time saving possible. It turns out that the
> fast multiplication algorithms are most efficiently implemented when the transform
> size is a power of 2. So, e.g., numbers around 3 million are relatively expensive
> to check (using Prime95) because they're too large for a 128K transform size but
> too small to need a 256K transform size. However, checking 3 at once would fit a
> 512K transform size reasonably well. There is, of course, a converse trend, in so
> far as larger transform sizes tend to cause more cache misses, therefore cache
> tuning for large transform size may become a significant issue.
>
> My suggested "trick" applies especially to other programs e.g. MacLucasUNIX which
> have no intermediate transform sizes between powers-of-two. A couple of days ago
> my Alpha system finished double-checking M(2361701) using a 128K transform; the
> next exponent I had lined up for it was M(2361721), and it's decided that a 256K
> transform is needed, thus doubling the run time 8-( (Don't forget this is
> DIFFERENT CODE to Prime95 - in fact, the Alpha FPU works to only 53 bits precision
> (IEEE G-floating), rather than the 64 bits precision which the Intel x86 FPU is
> capable of)
>
> Can anyone fault my logic? If not, then my suggested approach may possibly be more
> effective than writing extra code into MacLucasUNIX (and other LL testers) to cope
> with transform sizes intermediate between powers of 2. Also saves debugging a
> whole load of extra FFT code 8-)
>
> Regards
> Brian Beesley
Try to keep your lines below 75 characters so they still fit
on an 80-character screen after we copy and indent them.
It is valid to compute a residue modulo M(p) * M(q) * M(r)
and later reduce it modulo M(p), M(q), M(r). But how will the DWT
(Discrete Weighted Transform) modulo this product be done?
Presently, if we want a product modulo 2^p - 1 using
a convolution of length n, we scale the input using powers
of 2^(1/n), and the precise powers depend upon p.
The individual digits of the multiple-precision residue
do not all have the same base -- the pattern depends upon p.
How to you resolve these?
Peter
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm