For two days now I've been trying to find out why the 8092 x 8064 bit
division in our benchmark is slower on K10 than with GMP.

Here is what the benchmark calls:

* mpz_tdiv_q -- 128 x 126 limbs
* mpn_tdiv_q -- 128 x 126 limbs
* mpn_sb_divappr_q -- 7 x 4 limbs

Here is what I have tried:

* corrected some bugs in speed (not relevant to the benchmark)
* timed mpn_sb_divappr_q at 7 x 4 limbs using speed : GMP is about 5-10%
slower
* timed mpn_tdiv_q (the code that is executed is identical to that in GMP)
: GMP is 1% faster
* tried replacing the MPIR mpz_tdiv_q code with the GMP code : no change
* GMP does memory allocation at the mpz level and mpn level, we do it only
at the mpn level. I tried changing this : no change
* I timed mpn_sub_n, mpn_lshift_n, mpn_copy (the functions called along the
way) : GMP is slower or the same speed for all of these
* I found and removed an orphaned memory allocation in mpz_tdiv_q : no
change
* I tried combining two memory allocations in our mpn code into one : this
slowed it down even more
* I checked the precomputed inverse uses almost identical code, it's
probably 1 or 2 cycles slower in MPIR, but this is far too small to make a
difference
* I made sure the same random numbers were being generated for MPIR and GMP
* I made sure our benchmark code generated new random numbers every 1024
iterations to ensure the algorithms weren't affected by the choice of
numbers (in fact the time varies a lot when the numbers change)
* I tried the same compiler flags as GMP uses : no significant change
* tried --enable-alloca : no change
* tried both static and dynamic linking : makes at most 1% difference
* checked that MPIR is faster at this benchmark than GMP on penryn as
expected

mpn_tdiv_q takes around 500 cycles, so we are talking about 30 cycles here.
mpn_sb_divappr_q takes a little over 1/3 of that time.

Anyone have any brainwaves? I'm completely and utterly out of ideas. I've
tried absolutely everything. I cannot think of a single additional thing to
try!

1-3% is believable due to random C compiler issues. 6-7% is just not
believable. And in fact it is probably more than that since
mpn_sb_divappr_q is faster in MPIR than GMP as are some of the other mpn
functions called along the way.

Bill.

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/mpir-devel.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to