t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > > For n x 20, n pretty large, what strategy does mpn_mul use? I would > expect repeated toom32, but maybe that gives too much overhed. It seems > we don't have any MUL_TOOM32_TO_BASECASE threshold? > > I believe this is the logic triggered in this case: > > if (4 * un < 5 * vn) > mpn_toom22_mul (prodp, up, un, vp, vn, scratch); > else if (4 * un < 7 * vn) > mpn_toom32_mul (prodp, up, un, vp, vn, scratch); > else > mpn_toom42_mul (prodp, up, un, vp, vn, scratch);
On my machine (coreibwl), MUL_TOOM22_THRESHOLD is 27. If we benchmark mpn_mul_basecase against mpn_toom32_mul sing speed, the relevant crossover is where the ratio is close to 2/3 (since the size argument for speed measurements of toom32 is the size of the larger operand). I get $ ./tune/speed -p 10000 -r -s 20-50 mpn_mul_basecase mpn_toom32_mul overhead 0.000000005 secs, precision 10000 units of 1.25e-09 secs, CPU freq 798.34 MHz mpn_mul_basecase mpn_toom32_mul [...] 28 0.000001875 #0.7651 29 0.000002008 #0.7434 30 0.000002140 #0.7094 31 0.000002262 #0.7525 32 0.000002452 #0.7030 33 0.000002600 #0.6905 34 0.000002690 #0.7143 35 0.000002898 #0.6693 36 0.000003024 #0.6575 37 0.000003208 #0.6714 38 0.000003376 #0.6508 39 0.000003515 #0.6446 40 0.000003723 #0.6422 41 0.000003917 #0.6343 42 0.000004000 #0.6245 43 0.000004282 #0.6178 [...] So the threshold measured this way would be around 38 limbs. Translated to vn, the size of the smaller operand, the threshold becomes 25 limbs, very close to the toom22 threshold. Similarly comparing mpn_mul_basecase to mpn_toom42_mul, the relevant crossover is where the ratio is close to 1/2. My measurements look very flat, but threshold should be somewhere in the range 67-79 limbs for the large operand, hence in the range 33-40 limbs in terms of the size of the smaller operand, 20%-50% above the toom22 threshold. So from this initial experiment, it seems it might work fine to use the same threshold for toom22 and toom32, if applied to the size of the smaller operand. But we should tune a separate threshold for toom42 vs basecase, and never apply toom42 for unbalanced operands with vn below that threshold. Which maybe isn't that surprising: toom32 just adds +1 to the set of of points, {0, -1, inf}, used by toom22, with little additional complexity. While toom42 adds the interpolation point +2, which adds quite a bit of complexity, including more shifts and one call to mpn_divexact_by3. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel