> Your goal is to come up with an algorithm in pseudo-code that computes
> a * a mod f where a and f are between 65 and 80 bits.
>
80 bits seems too large for factoring (I hope this isn't the break even
point!). If 2*k*p+1=2^80 with p=10000000 then k=6e16.
This is what I suggest
1. If a potential divisor is N. Use Newton's method to find 2^s/N for s
sufficiently large. Note this should be an integer. Newton's method
finds the reciprocal of a number. Start with an approximation x{0} then
x{n+1} = 2 * x{n} - v * x{n}^2
The correct number of digits doubles each iteration. This can be done
without floating point arthimetic.
x{n+1} = 2 * x{n} - v * x{n}^2 / 2^s
example. N = 7 and 2^20/7 = 149796
For first approximation guess x{0} = 100000
x{1} = 2 * 100000 - 7 * 100000^2 / 2^20 = 200000 - 66757 = 133243
x{2} = 2 * 133243 - 7 * 133243^2 / 2^20 = 266486 - 118518 = 147968
x{3} = 2 * 147968 - 7 * 147968^2 / 2^20 = 295936 - 146161 = 149775
x{4} = 2 * 149775 - 7 * 149775^2 / 2^20 = 299550 - 149753 = 149797
Then to test a divisor requires lg p squares mod N (lg is binary log or
ln p/ln 2). Multiply the square by 2^s/N then shift s bits right and
multiply by N and subtract from original number to find the remainder.
example. N = 7 and 2^20/7 = 149796 (Here s=20 can be smaller)
a = 123
d = a * 2^20/N = 18424908
d / 2^20 = 17 (quotient)
17 * 7 = 119
123 - 119 = 4
and 123 mod 7 == 4. This requires a multiprecision multiplication,
square, and subtract. On the P-II a multiplication is 4 cycles.
The initial approximation x{0} in Newton's method can use the approximation
of the previous factor tried or since this algorithm is all-integer the
previous computation can use fdiv to find an approximation for free. This
approximation could be within 62 bits, requiring only a few iterations of
Newton's method.
2. Use trial-factoring then P-1. Ernst Mayer has already shown that a
fast GCD is possible. Even a one stage P-1 would be more efficient than
trial-factoring.