For two days now I've been trying to find out why the 8092 x 8064 bit division in our benchmark is slower on K10 than with GMP.
Here is what the benchmark calls: * mpz_tdiv_q -- 128 x 126 limbs * mpn_tdiv_q -- 128 x 126 limbs * mpn_sb_divappr_q -- 7 x 4 limbs Here is what I have tried: * corrected some bugs in speed (not relevant to the benchmark) * timed mpn_sb_divappr_q at 7 x 4 limbs using speed : GMP is about 5-10% slower * timed mpn_tdiv_q (the code that is executed is identical to that in GMP) : GMP is 1% faster * tried replacing the MPIR mpz_tdiv_q code with the GMP code : no change * GMP does memory allocation at the mpz level and mpn level, we do it only at the mpn level. I tried changing this : no change * I timed mpn_sub_n, mpn_lshift_n, mpn_copy (the functions called along the way) : GMP is slower or the same speed for all of these * I found and removed an orphaned memory allocation in mpz_tdiv_q : no change * I tried combining two memory allocations in our mpn code into one : this slowed it down even more * I checked the precomputed inverse uses almost identical code, it's probably 1 or 2 cycles slower in MPIR, but this is far too small to make a difference * I made sure the same random numbers were being generated for MPIR and GMP * I made sure our benchmark code generated new random numbers every 1024 iterations to ensure the algorithms weren't affected by the choice of numbers (in fact the time varies a lot when the numbers change) * I tried the same compiler flags as GMP uses : no significant change * tried --enable-alloca : no change * tried both static and dynamic linking : makes at most 1% difference * checked that MPIR is faster at this benchmark than GMP on penryn as expected mpn_tdiv_q takes around 500 cycles, so we are talking about 30 cycles here. mpn_sb_divappr_q takes a little over 1/3 of that time. Anyone have any brainwaves? I'm completely and utterly out of ideas. I've tried absolutely everything. I cannot think of a single additional thing to try! 1-3% is believable due to random C compiler issues. 6-7% is just not believable. And in fact it is probably more than that since mpn_sb_divappr_q is faster in MPIR than GMP as are some of the other mpn functions called along the way. Bill. -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to mpir-devel+unsubscr...@googlegroups.com. To post to this group, send email to mpir-devel@googlegroups.com. Visit this group at http://groups.google.com/group/mpir-devel. For more options, visit https://groups.google.com/groups/opt_out.