In case anyone is interested, I just spent the last week trying to
implement a "straightforward" version of Moller's half gcd algorithm for
asymptotically fast integer GCD:

http://www.lysator.liu.se/~nisse/archive/S0025-5718-07-02017-0.pdf

Here it is:


https://github.com/wbhart/ccas/blob/master/nn/gcd_ngcd.c
https://github.com/wbhart/ccas/blob/master/nn/ngcd.c
https://github.com/wbhart/ccas/blob/master/nn/ngcd_sdiv.c
https://github.com/wbhart/ccas/blob/master/nn/ngcd_sdiv_cmp.c
https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_apply.c
https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_update.c
https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_mul.c
https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_identity.c
https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_init.c
https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_clear.c

The main things I do differently to the paper are:

1) I break on limb boundaries instead of bit boundaries (the main theorems
in Moller's paper still hold in this case)

2) I always assume that  that m >= n when taking gcd of {a, m} and {b, n}
where m and n are the number of words

3) I try to only deal with nonnegative integers throughout the
implementation

4) I prevent the algorithm from recursing too deep, which it will do
without the additional tests I've added in ngcd.c (which otherwise looks
like the pseudocode in Moller's paper). (Without being careful of this, I
found the complexity to be more like cubic than quasilinear :-))

I've done lots of testing and I am fairly sure if there are any bugs left,
they are fairly obscure.

It's about 10x slower than the much more complicated version in GMP/MPIR.
Their implementation does the following things which are not done in my
implementation:

* my integer multiplication code is entirely written in C, so is 5-10x
slower (up to about a billion bits)

* they optimise division, which is usually just a subtraction when done in
gcd algorithms, as all the terms in the remainder sequence are usually only
1 or 2 bits bigger than the next

* they fall back to a fast Lehmer gcd, I fall back to the standard
Euclidean algorithm

* they use 2x2 Strassen matrix multiplication

* they do lots of minor optimisations, such as allocating all memory up
front and avoiding copies, etc.

Note that the crossover from Euclidean is only 20 words on a 64 bit machine!

Bill.

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/mpir-devel.
For more options, visit https://groups.google.com/d/optout.

Reply via email to