Interesting. I see that this paper compares to NTL as well. I spent the morning seeing what I could do to improve the situation for NTL, whose mul routine has essentially the same functionality as mpz_mul (takes care of memory allocation and signs). I reduced some overheads and for small inputs just called this function:
static inline limb_t base_mul (_ntl_limb_t* rp, const limb_t* up, long un, const limb_t* vp, long vn) { rp[un] = mpn_mul_1 (rp, up, un, vp[0]); while (--vn >= 1) { rp += 1, vp += 1; rp[un] = mpn_addmul_1 (rp, up, un, vp[0]); } return rp[un]; } At least this has the virtue of using only public GMP functions, and avoids one level of function call. This gave me a bit of a speedup speed up (1.35-1.40x speedup for 2-3 limbs). Beyond 4 limbs, I don't get much of an improvement at all, so I cut it off at 4. I also special case single limb inputs (getting a 2.5x speedup for those). Within my mul routine, I also detect if the two inputs point to the same bigint. If so, I directly call man_sqr, rather than going through mpn_mul and letting it make that test (but I also special case one limb inputs and inline that). This also speeds things up nicely for small inputs (up to 4 limbs). Looking at the current code for mpn_sqr, it looks like it branches immediately to mpn_sqr_basecase for small inputs. Anyway, that's my > On Apr 19, 2018, at 10:14 PM, Fredrik Johansson <fredrik.johans...@gmail.com> > wrote: > > For operands with 1-4 limbs, that is; on my machine, mpn_mul takes up to > twice as long as mpn_mul_basecase, and inline assembly for 1x1, 2x1 or 2x2 > multiplication is even faster. The problem is that there are three function > calls (mpn_mul -> mpn_mul_n -> mpn_mul_basecase) + branches between the > user code and GMP's lightning fast assembly code. > > I was reminded of this old issue when seeing this new paper on arXiv: > https://arxiv.org/abs/1804.07236. Here, the author benchmarked a C++ > implementation of bignum arithmetic against mpn_mul for small operand sizes > and came to the conclusion that the former approach performs better than > hand-optimized assembly (one wishes that compilers really were that clever > about bignum code by now!). > > Some advanced GMP users (including myself) know about the issue and simply > avoid mpn_mul for performance-critical code with short operands. The most > convenient solution is to call mpn_mul_basecase directly instead of > mpn_mul. Unfortunately, mpn_mul_basecase is not public, so this is a bit > iffy to rely on. One feature request would be to simply make > mpn_mul_basecase / mpn_sqr_basecase public. > > It should also be possible to speed up mpn_mul. A possible optimization > would be to call mpn_mul_basecase directly in mpn_mul without taking the > detour through mpn_mul_n for small balanced multiplications. A more > aggressive optimization would be to change the architecture of mpn_mul so > that assembly code is reached directly. I believe at least some of the > assembly implementations of mpn_mul_basecase already do something like this: > > case 1x1: ... > case 2x1: ... > case 2x2: ... > generic case: (basecase loop) > > It would be possible to have mpn_mul itself assembly-coded to do something > like this: > > case 1x1: ... > case 2x1: ... > case 2x2: ... > generic case, small n: (basecase loop) > generic case, large n: (fall back to calling an mpn_mul_generic function > that selects between different algorithms) > > An even more ambitious change would be to provide some kind of interface in > GMP for arithmetic with minimal overhead when the number of limbs is a > small compile-time constant. For example, such an interface could consist > of macros of the type mpn_op_nxm for op = add/sub, mul and addmul/submul > and different small (n,m), Of course, such macros could simply call the > normal mpn functions as a generic fallback if an optimized assembly block > for that size is not available. Another possibility for an interface would > be to detect compile-time constants in the function arguments to the usual > mpn functions (with the help of macros), but that might be too much magic... > > Best, > Fredrik > _______________________________________________ > gmp-devel mailing list > gmp-devel@gmplib.org > https://gmplib.org/mailman/listinfo/gmp-devel _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel