Re: Mersenne: A challenge

BJ . Beesley Mon, 23 Nov 1998 15:41:43 -0500
>       I'd like to challenge the readers to produce the best algorithm
>and/or code for finding factors of Mersenne numbers greater than 64-bits.

Beat this! (I'm sure it's possible). In particular my i86 assembler is
a bit rusty, so maybe a few clocks could be shaved off by use of more
appropriate registers.

Given a divisor with more than 64 but not more than 96 significant
bits, stored in low-byte-first order in an array of 32-bit integers
pointed to by q, this procedure calculates the remainder after
dividing into 2^p-1. The result is stored in an array of (at least 3)
32-bit integers pointed to by r. Only the first three elements of the
array pointed to by r will be written to; the array pointed to by q
is not written to at all. If it is not required to preserve the value
in q, then the procedure can be called with q & r pointing to the same
array.

If the value pointed to by q has less than 65 significant bits, the
first three elements of the array pointed to by r will contain all
"1" bits on exit. Note that this can never be a valid remainder.

Dependencies:

(a) A 32-bit version of Microsoft Visual C++, or something compatible
    (MS VC++ 5.0 used for development)
(b) object processor, Intel 80386 compatible
(c) Microsoft __cdecl calling convention used by caller
(d) (I think) DS = SS is assumed...

Features:

(a) Timing, (normal exit) proportional to p (i.e. to logarithm of
number to be factored), but effectively independent of (legal) values
of *q. Approx. 12.5 * p / CPU speed(Hz), measured on Pentium 100
Classic, i.e. 0.625 sec for p ~ 5,000,000. This is about one quarter
the speed of the algorithm currently used by Prime95 for divisors up
to 62 bits in length.

(b) Manages to totally avoid memory access of data in the core loop
of the code (from label L3 to the conditional jump back to L3).
Branching is reduced to an absolute minimum.

(c) Does not use FPU at all. (If we're being *really* clever, perhaps
we could run some FPU-intensive code in parallel!)

(d) Algorithm easily implemented for any other CPU type with "normal"
32-bit arithmetic & logical operations.

(e) Gives an (almost) free test for divisibility of 2^p+k by *q - e.g.
if 2^p+1 is divisible by *q, then, on exit, *r will contain a value
which will be 2 less than (the entry value of) *q.

(f) Quotient is not stored by this procedure. Doing this would cause
a significant reduction in speed, not justified for trial factoring.

Copyright (c) Brian Beesley   23-Nov-1998
GNU Public Licence conditions apply

....................................................................

void remainder96(unsigned int p,unsigned int * q,unsigned int * r)
{
__asm
{
        push    esi
        push    edi

        mov     ebx,q
        mov     esi,-1
        mov     edi,esi

        mov     eax,32
        mov     edx,[ebx+8]

L1:     and     edx,edx
        js      L2
        dec     eax
        jz      errex
        shl     edx,1
        shr     edi,1
        jmp     L1

L2:     add     eax,64
        neg     eax
        add     eax,p
        jnc     toosml
        mov     ecx,eax

        push    ebp
        mov     ebp,[ebx+8]
        mov     edx,[ebx+4]
        mov     ebx,[ebx]

        mov     eax,esi
        sub     eax,ebx
        sbb     esi,edx
        sbb     edi,ebp
                
L3:     stc
        rcl     eax,1
        rcl     esi,1
        rcl     edi,1

        sub     eax,ebx
        sbb     esi,edx
        sbb     edi,ebp
        jnc     L4

        add     eax,ebx
        adc     esi,edx
        adc     edi,ebp

L4:     dec     ecx
        jnz     L3

        pop     ebp
        mov     ebx,r
        mov     [ebx],eax
        mov     [ebx+4],esi
        mov     [ebx+8],edi
        jmp     exitOK

errex:  mov     ebx,r
        mov     [ebx],esi
        mov     [ebx+2],esi
        mov     [ebx+4],esi
        jmp     exitOK

toosml: mov     ebx,q
        mov     eax,[ebx]
        mov     ecx,[ebx+4]
        mov     edx,[ebx+8]
        mov     ebx,r
        mov     [ebx],eax
        mov     [ebx+4],ecx
        mov     [ebx+8],edx
        
exitOK: pop     edi
        pop     esi
}
}

....................................................................
Re: Mersenne: A challenge

Reply via email to