Blosser, Jeremy writes:

   So I ran JavaLucas with an FFT of 256k Its getting 1.75
   sec/iteration. So...  not too shabby really...

Not bad at all.

   1) What the algorithm for finding the best FFT size. I was doing:
   [...] 
   So I looked at MacLucasUNIX, and it has:

        while (n < ((BIG_DOUBLE)q)/ROUNDED_HALF_MANTISSA + 4.0)
                n <<= 1;
             size = n + ((n & MASK) >> SHIFT);

   What I can't find is where the ROUNDED_HALF_MANTISSA came from, and
   why the mask and shift operation?

ROUNDED_HALF_MANTISSA in MacLucasUNIX is mis-named, really; as I
recall (and I didn't write the original use of it), it's just a
constant based indirectly on the number of bits in the mantissa of the
doubles being used.  You'll find the declaration in setup.h, based on
the define of BIG_DOUBLE.

The mask and shift is a "trick" (that I believe George Woltman passed
on to John Sweeney) to leave gaps in the arrays to improve cache
performance.  See the file "Wisdom" which John Sweeney wrote and I
pass on as part of the mers distribution.  I have no idea whether it
will help or hurt Java performance, but there are some hooks in the
mers package code to use it (MacLucasUNIX) or turn it off (mersenne1)
in some of the functions that use the primary array.

   2) I'm looking at the code, in Lucac.c and it looks like:
        a) create the scrambled array scrambled[j] = n[permute[j-1]] *
   two_to_phi[permute[j-1]]
                (not quite sure why)
        b) it converts the scrambled array to fft, squares it, does the
   inverse fft...
        c) multiplies the scrambled array * two_to_minusphi
                (Is this the mod here???)
        d) calls addsignal
                (this one has me lost)
        e) calls patch
                (carry?)

I believe this is simpler in both mersenne1.c and MacLucasUNIX.c,
though it's been a while since I've looked at any of them this closely
and I myself don't fully understand FFTs.

   So, my question is, where is the subtraction?
        n = (n*n-2) % (2**p-1)
   I see the square, I see (I think) the mod... wheres the -2?

MacLucasUNIX.c does it in normalize().  Mersenne1.c does it in main().

   3)  I was thinking in the shower this morning... (Odd ain't it?)

   And for all n < 2**p-1, the answer will always be n*n-2 right? Also, if n <
   sqrt(MAXLONG), then we can do it in place without the FFT...
   Just looking at some data sets, the odd of n being less than sqrt(MAXLONG)
   is actually not to bad... (especially in Java which has 64-bit longs)...
   So where does this short cut fit in? (with a big FFT, this can be a big
   difference (I think))...

I'm not sure, but I'd guess that you've goofed somewhere.  For
reasonably large p (in this case, anything over about 100), the chance
of n being less than 33 bits should be miniscule, except, of course,
during the first several iterations.  But perhaps I'm misunderstanding
what you're saying.

And even when I tried the "obviously-it-will-work" change to start at
194 or 14 (rather than 4) in mersenne1 and MacLucasUNIX, there were
problems on some computers and compilers.

                                                        Will

http://www.garlic.com/~wedgingt/mers.tar.gz
                                mers.tgz
                                mers.zip
                                mersenne.html
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

Reply via email to