ni...@lysator.liu.se (Niels Möller) writes: t...@gmplib.org (Torbjörn Granlund) writes:
> Looking at when method 1 or 3 is faster than 2 is more interesting. > Method 1 and to some extent also method 3 would benefit from asm code, How would method 1 benefit from asm code? To use instructions providing both quotient and remainder? For x86_64, it looks like gcc uses xor %edx, %edx div %rcx to get quotient and remainder (i.e., using the 128/64 division instruction; there seems there is no 64/64 divison instruction?). I meant to write method 2 and 3. Method 2 and 3 begs for using subtract + cmov, not sure if the compiler is clever enough to do that for the code using masks. The compiler is not that clever, it seems. > Would your present tuneup/speed setup allow measuring of asm code? I think so, ./speed mpn_hgcd2 will measure the mpn_hgcd2 function in the library, with no extra hacks. Will maybe need some tricks to be able to select other methods and ignore the asm version, but I'd expect that to be fairly easy. Well, one will at least need to suppress the local versions by editing hgcd2.c. No? > The current div1 measurements include hgcd2's own time, right? I.e., if > we found a div1 which runs in zero cycles, the timings would not be > zero. Yes. Do you think it makes sense to measure div1 in isolation? Since performance depends so much on the distribution of the inputs, measuring hgcd2 makes sense to me. I think I agree. The distribution of operands are the same as for a plain, old Euclid, isn't it? So one could perhaps provide a set of uniformly distribted random numbers for div1 and get adequate results? I'm appending another iteration of the patch to add div2 function based on div1 on the high limbs. Selected via HGCD2_DIV2_METHOD. Benchmarks: HGCD2_DIV2_METHOD mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3 1 #1504.47 1740.53 1513.56 div1 + unlikely case 2 #1739.66 1858.01 1753.73 current code 3 #1615.32 1736.69 1621.95 the if:ed out version Looks good. Ready for repo? What is the current reasoning on suppressing the q = 1 code in the hgcd2 code calling div1 and div2? That's largely orthogomal to the current HGCD2_DIV*_METHOD choices, I suppose, although surely q = 1 special handling makes more sense for HGCD2_DIV1_METHOD = 1 that for the other methods. A combinatoric explosion would be nice to avoid. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel