Peter Foster asks:

> When doing an LL test we are always calculating the 
> same series of numbers. The modulus is different, so 
> we can't use the result of one test to help with 
> another. I'm wondering why we don't do the following:
> 
> Take 2 Mersenne numbers, m1 and m2 (m1 < m2).
> Do the usual LL series, but use as the modulus m1*m2.
> At the appropriate step, check if the remainder is 
> divisible by m1. If so, then m1 is prime.
> At the end, check if the remainder is divisible by 
> m2. If so, then m2 is prime.
> 
> This allows us to do two (or more) tests for the price 
> of one. What is the obvious reason we don't do this?
 
     In theory this is fine.  But it is likely to cost more
than making two separate tests.


     Suppose m1 = 2^p1 - 1 and m2 = 2^p2 - 1.
Residues modulo m1 have p1 bits, and those modulo m2 have p2 bits.
The time to square modulo m1 is estimated as O(p1*log(p1))
The time to square modulo m2 is estimated as O(p2*log(p2)).
The time to square modulo m1*m2 is estimated as O((p1+p2)*log(p1+p2)).

    We'll assume p1 and p2 are large but have the same magnitude.
For example, each might be between 6332000 and 6333000.
Then log(p2) - log(p1) = log(p2/p1) is tiny compared to
log(p2) or log(p1), so we approximate log(p2) = log(p1) and
log(p1+p2) = log(2*p1) = log(2) + log(p1).  

    The combined time to separately square modulo m1 and modulo m2
becomes O((p1+p2)*log(p1)).  A single computation modulo m1*m2 takes
time O((p1+p2)*(log(2) + log(p1))).  For p1, p2 near 6 million ~= 2^22,
the latter time is about 4.5% longer.

    This estimate is slightly optimistic.  The techniques used
for reduction modulo 2^p1 - 1 and 2^p2 - 1 don't immediately generalize
to reduction modulo (2^p1 - 1)*(2^p2 - 1).

    Despite this, it may be beneficial writing a program which
does several nearby exponents at once.  Consider a SIMD
(single instruction, multiple data) machine with n processors.
Each of the n processors can be assigned a different exponent.
All use the same FFT length, operating on local data unique
to the processor's exponent.  The larger exponents will need a few
more LL iterations than the smaller exponents, but this gap 
in amount of work per processor will typically be under 1%.
This scheme would require considerable local memory on each processor.

     Peter Montgomery



________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

Reply via email to