Ciao, Il Mar, 9 Giugno 2015 7:02 pm, Torbjörn Granlund ha scritto: > bodr...@mail.dm.unipi.it writes:
> We should probably use sqr whenever possible, perhaps it is worth having > a condition for this in rootrem.c, (I believe mpn_pow_1 gets it right, > and uses sqr whenever possible, since I looked into that as an mpn_pow_1 correctly uses sqr, the problem within rootrem arises only when k==2. The two lines of code: wn = mpn_pow_1 (wp, sp, sn, k - 1, qp); mpn_mul (qp, wp, wn, sp, sn); degenerate with k=2. mpn_pow_1 is used to perform an MPN_COPY so that the two operands passed to mpn_mul contain the same number... Maybe an initial condition if (k == 2) return mpn_sqrtrem (....), but the specialised code for a single k should not get inside the generic rootrem function, in my opinion. I wrote the hack only to understand the algorithm... I'll clean up rootrem somehow, but I'll not insert special code. > tune/speed -r -p10000000 -s16-10000000 -f1.438 mpn_sqrt mpn_root.2 > I get about 0.97 on average. This can be healed using in sqrtrem the same strategy that rootrem uses to skip the last squaring if the reminder is not required: adding a couple of zero limbs and compute a slightly larger root... > And > tune/speed -r -p10000000 -s16-10000000 -f1.438 mpn_sqrtrem > mpn_rootrem.2 > gives around 1.25. This is not bad, a generic code can be slightly slower than a specialised one. > The last functions called by mpn_sqrtrem when computing the root of a > 2000 > limb operand are: > > mpn_divrem nn=1000/dn=500 -> qn=500,rn=500 > mpn_sqr an=500 ^ 2 -> rn=1000 > > Similarly, for mpn_rootrem: > > mpn_div_q nn=1000/dn=501 -> qn=499 > mpn_sqr an=1000 ^ 2 -> rn=2000 > > Interesting indeed. I wonder why they need different multiply sizes. sqrtrem obtain the reminder with divrem and needs only the b^2 part of (ax+b)^2 to update it. rootrem rebuild the reminder from scratch every loop... because the generic (ax+b)^k has too many components. > I am not sure that div_q vs divrem matters a whole lot for this usage > (i.e., 2n/n sizes). The manual says, about mpn_divrem: This function is obsolete. Please call mpn_tdiv_qr instead for best performance. With a little scratch space passed to mpn_dc_sqrtrem, it's easy to stop using divrem, but... will the performance really benefit of this? Regards, m -- http://bodrato.it/toom-cook/ _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel