In case anyone is interested, I just spent the last week trying to implement a "straightforward" version of Moller's half gcd algorithm for asymptotically fast integer GCD:
http://www.lysator.liu.se/~nisse/archive/S0025-5718-07-02017-0.pdf Here it is: https://github.com/wbhart/ccas/blob/master/nn/gcd_ngcd.c https://github.com/wbhart/ccas/blob/master/nn/ngcd.c https://github.com/wbhart/ccas/blob/master/nn/ngcd_sdiv.c https://github.com/wbhart/ccas/blob/master/nn/ngcd_sdiv_cmp.c https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_apply.c https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_update.c https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_mul.c https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_identity.c https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_init.c https://github.com/wbhart/ccas/blob/master/nn/ngcd_mat_clear.c The main things I do differently to the paper are: 1) I break on limb boundaries instead of bit boundaries (the main theorems in Moller's paper still hold in this case) 2) I always assume that that m >= n when taking gcd of {a, m} and {b, n} where m and n are the number of words 3) I try to only deal with nonnegative integers throughout the implementation 4) I prevent the algorithm from recursing too deep, which it will do without the additional tests I've added in ngcd.c (which otherwise looks like the pseudocode in Moller's paper). (Without being careful of this, I found the complexity to be more like cubic than quasilinear :-)) I've done lots of testing and I am fairly sure if there are any bugs left, they are fairly obscure. It's about 10x slower than the much more complicated version in GMP/MPIR. Their implementation does the following things which are not done in my implementation: * my integer multiplication code is entirely written in C, so is 5-10x slower (up to about a billion bits) * they optimise division, which is usually just a subtraction when done in gcd algorithms, as all the terms in the remainder sequence are usually only 1 or 2 bits bigger than the next * they fall back to a fast Lehmer gcd, I fall back to the standard Euclidean algorithm * they use 2x2 Strassen matrix multiplication * they do lots of minor optimisations, such as allocating all memory up front and avoiding copies, etc. Note that the crossover from Euclidean is only 20 words on a 64 bit machine! Bill. -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to mpir-devel+unsubscr...@googlegroups.com. To post to this group, send email to mpir-devel@googlegroups.com. Visit this group at http://groups.google.com/group/mpir-devel. For more options, visit https://groups.google.com/d/optout.