Re: Mersenne: Understanding the Propagate Carry Step in the LL Test

Peter-Lawrence . Montgomery Fri, 05 Jan 2001 00:14:02 -0800
    Fix a length n.  If (a0, a1, ..., a_(n-1)) and
(b0, b1, ..., b_(n-1)) are sequences of length n
(in some ring or field), then their circular convolution
is (c0, c1, ..., c_(n-1)) where

      c_i = sum_{i == j + k mod n}  a_j * b_k

For example, when n = 5, the circular convolution of
(a0, a1, a2, a3, a4) and (b0, b1, b2, b3, b4) is
(c0, c1, c2, c3, c4) where

      c0 = a0*b0 + a1*b4 + a2*b3 + a3*b2 + a4*b1
      c1 = a0*b1 + a1*b0 + a2*b4 + a3*b3 + a4*b2
      c2 = a0*b2 + a1*b1 + a2*b0 + a3*b4 + a4*b3
      c3 = a0*b3 + a1*b2 + a2*b1 + a3*b0 + a4*b4
      c4 = a0*b4 + a1*b3 + a2*b2 + a3*b1 + a4*b0

    The circular convolution is useful for polynomial
multiplication.  Here, if
      
      A(X) = a0 + a1*X + a2*X^2 + a3*X^3 + a4*X^4
      B(X) = b0 + b1*X + b2*X^2 + b3*X^3 + b4*X^4
      C(X) = c0 + c1*X + c2*X^2 + c3*X^3 + c4*X^4

then 

      A(X)*B(X) = C(X) + (X^5 -   1)(a1*b4 + a2*b3 + a3*b2 + a4*b1)
                       + (X^6 -   X)(a2*b4 + a3*b3 + a4*b2)
                       + (X^7 - X^2)(a3*b4 + a4*b3)
                       + (X^8 - X^3)(a4*b4)
                == C(X)   (mod X^5 - 1)

In general C(X) == A(X)*B(X) mod (X^n - 1).
By choosing n > deg(A) + deg(B) (i.e., padding the input 
polynomials with leading zero coefficients) 
one can get their product exactly.

    At first glance the computation of the ci's seems
to require n^2 multiplications and n(n-1) additions.
Techniques which I won't describe reduce this cost to O(n*log n) for large n.  

    If all ai and bi are integers, then so are the ci.
An upper on the ci is easily derived from upper bounds
on the ai and bi.  Convolution algorithms which use 
floating point (i.e., approximate) arithmetic 
can estimate the largest potential error in the ci.
If this potential error is under 0.5, then one can
round the approximate ci to the nearest integers,
ensured one will get the correct results.

--------- Large integer multiplication

    When p is prime, the Lucas-Lehmer test for Mp = 2^p - 1
needs many squarings modulo Mp.  How do we use the
circular convolution to assist us?

    The task is to multiply two integers a, b in [0, Mp)
(or the closed interval [0, Mp] if we allow two 
representations for 0 mod Mp), getting another such result.  
We know the binary representations of a and b.

    The textbook strategy first computes the double-length product a*b.
later reducing this modulo Mp.  Choose a radix R (a power of 2)
and a length n such that R^n > Mp.  Write the inputs in radix R:

       a = a0 + a1*R + ... + a_(n-1)*R^(n-1)
       b = b0 + b1*R + ... + b_(n-1)*R^(n-1)

where all ai and bi are in [0, R-1). 
[Or use signed digits in [-R/2, R/2].
Set ai = 0 and bi = 0 for n <= i < 2n.
Use a length-(2n) convolution to multiply the 
corresponding polynomials A(X) and B(X) modulo X^(2n) - 1.
This will give the exact polynomial product since
deg(A) + deg(B) <= (n-1) + (n-1) < 2n.
The coefficients of this product are in [0, n*(R-1)^2] --
the convolution algorithm must produce accurate outputs
up to about n*R^2.

    Let the output coefficients be 
(c0, c1, ..., c_(2n-2), c_(2n-1)) where c_(2n-1) = 0.
To get the integer product c = a*b we set X = R
in the polynomial identity C(X) = A(X)*B(X).  

    Alas, the digits of C can exceed R-1.
We circumvent this by adding an appropriate multiple
of X-R to C(X).  The loop resembles

        carry := 0
        for j from 0 to 2n-2 do
            sum := carry + c_j
            carry := truncate(sum / R)
            c_j := sum - carry*R
        end for
        c_(2n-1) := carry

Now one can read off the product a*b, in radix R.

------------ Multiplication modulo Mp

    Given a, b in [0, Mp-1], represented in radix R,
the last section describes how to get their product a*b in radix R.
Reduction modulo Mp is easy when R is a power of 2.

    This algorithm uses a length-(2n) convolution where R^n > Mp.  
It seems wasteful to insert all those leading zeros into the 
(a0, ... a_(n-1)) and (b0, ..., b_(n-1)) sequences?  Can we avoid this?

    The answer is yes if R^n - 1 = Mp, i.e., if R = 2^(p/n).
Let's look at this case in detail.  We write a and b in radix R,
with digits in [0, n-1].  Take the polynomial product
modulo X^n - 1 (rather than modulo X^(2n) - 1).  Then let X = R.  
The full polynomial product is not obtained
(only the product modulo X^n - 1).  
When we substitute X = R, the resulting integer may be off 
by a multiple of R^n - 1 = 2^p - 1 = Mp, but that is OK.
The coefficients in the polynomial product modulo X^n - 1
will not exceed n*(R-1)^2.  

    This seems to require that R be an integer, 
which is possible only if p/n be an integer.
When p is prime, this allows only the cases R = 2
and R = 2^p.  Neither is convenient to use.

    The Crandall-Fagin trick allows this to work even when 
p does not divide p, using an irrational radix R = 2^(p/n).  
We illustrate with n = 5 and p = 5k + 2.
Given an integer a in [0, 2^p - 1], expand it as

       a = a0' + a1'*2^(k+1) + a2'*2^(2k+1) + a3'*2*(3k+2) + a4'*2^(4k+2)

where a0', a1', a2', a3', a4' are integers in [0, 2^k - 1] or [0, 2^(k+1) - 1].
[The exponents are CEIL(0*p/5), CEIL(1*p/5), ..., CEIL(4*p/5).]  Then

       a = a0 + a1*R + a2*R^2 + a3*R^3 + a4*R^4

where
       a0 = a0'
       a1 = a1' * 2^(3/5)
       a2 = a2' * 2^(1/5)
       a3 = a3' * 2^(4/5)
       a4 = a4' * 2^(2/5)

The a_i are bounded by 2^(k + 1 + 4/5)  (better bounds are easily found).
To multiply a*b mod Mp where 0 <= b <= Mp, get a similar expansion

       b = b0 + b1*R + b2*R^2 + b3*R^3 + b4*R^4

for b, where b0, b1, b2, b3, b4 are multiplies of 1, 2^(3/5),
2^(1/5), 2^(4/5), 2^(2/5) respectively, bounded by 2^(k + 1 + 4/5).

    Form the convolution product (c0, c1, c2, c3, c4) of
(a0, a1, a2, a3, a4) and (b0, b1, b2, b3, b4). 
>From the definition of the convolution product, one verifies
that 0 <= c_i < 5*2^(2k + 18/5).  Furthermore, if one defines

       c0' = c0
       c1' = c1 * 2^(-3/5)
       c2' = c2 * 2^(-1/5)
       c3' = c3 * 2^(-4/5)
       c4' = c4 * 2^(-2/5)

then the c_i' are integers.  A convolution algorithm using
floating point arithmetic can delay rounding its outputs
until it has approximations to c0' to c4' (rather than c0 to c4).

    The output (c0, c1, c2, c3, c4) is a mixed-radix 
(radices 2^k and 2^(k+1)) representation of some integer 
congruent to a*b modulo Mp,
with digits bounded by a small multiple of 2^(2*k).
The carry propagation step reduces these digits below 2^k 
(or below 2^(k+1)).  Something like (when using signed digits):
  
        carry := 0             (Or carry := -2. for LL test).

        c0 := carry + round(c0' * 2^(-0/5))
        carry := round(c0 / 2^(k+1))
        c0 := c0 - carry * 2^(k+1)

        c1 := carry + round(c1' * 2^(-3/5))
        carry := round(c1 / 2^k)
        c1 := c1 - carry * 2^k

        c2 := carry + round(c2' * 2^(-1/5))
        carry := round(c2 / 2^(k+1) )
        c2 := c2 - carry * 2^(k+1)

        c3 := carry + round(c3' * 2^(-4/5))
        carry := round(c3 / 2^k)
        c3 := c3 - carry * 2^k

        c4 := carry + round(c4' * 2^(-2/5))
        carry := round(c4 / 2^k)
        c4 := c4 - carry * 2^k


The final carry is bounded by a small multiple of 2^k.
It has weight 2^(5k + 2) = 2^p.  Since we are
working modulo Mp = 2^p - 1, we can give this carry-out
a weight of 1, adding it to c0.  The next carry (to c1) 
is bounded by O(1) -- we'll rarely need to go beyond there.


---------

> From: "Brindle, Gordon" <[EMAIL PROTECTED]>

Hi Everyone and a Happy New Year to you all,

I wondered if anyone can help me understand (perhaps by way of an example)
how the Propagate Carry step works in the Crandall/Fagin LL test.  I have
browsed the archives of this list but can't find anything that fully answers
my question. I would like to implement a LL test in Mathematica along the
lines of the partial example given by Crandall in Chapter 3 of "Topics in
Advanced Scientific Computation" p94 but I am having difficulty
understanding this step (I've read the Crandall/Fagin 1994 paper, too). I
believe this step is what Crandall refers to as "add-with-carry, immediately
after the weighted convolution" on p96 of his book. My interest is in trying
to gain a mathematical understanding of all the steps for a practical
implementation of the LL test using DWT (so that I can extend Crandall's
example into a fully-fledged LL test, albeit to work with small exponents).

If you can help me to understand this step I should be most grateful,

Regards,

Gordon.

Gordon Brindle
E [EMAIL PROTECTED]



_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Understanding the Propagate Carry Step in the LL Test

Reply via email to