Just a quick note. I don't think is makes much sense to handle two 128-bit words in this code. In fact, the use of uintmax_t was a mistake, it should use "unsigned long" or "unsigned long long" whichever is efficiently supported directly by the hardware.
While uintmax_t could be made to work also for the cases you report, it will be inefficient. A few months ago, I contributed a suggested new core factoring function to GNU coreutils, which is, IIRC, 2-3 times faster than the current code for the bignum range. It uses GMP's mpn functions with "Montgomery multiplication". There were two problems with my new core code: 1. It will not work with mini-GMP, which I think is undesiable. 2. It did not optimise for smaller factoring tasks. The fix for this would be to merge that from the current coreutils factor.c. -- Torbjörn Please encrypt, key id 0xC8601622