t...@gmplib.org (Torbjörn Granlund) writes: > I'm having problems with timing of the gcd_11 code. Unfortunately, the > nested macros of speed.h make things hard to read. Could yo > double-check that operands to gcd_11 are odd and full limbs?
I'm fairly sure they are odd; I've ran speed on a --disable-assembly --enable-assert build. But they're full limbs only for -s 64. This is what I see, after your asm changes: $ ./tune/speed -p 100000 -c -s 1-64 mpn_gcd_1 mpn_gcd_11 overhead 5.01 cycles, precision 100000 units of 1.14e-09 secs, CPU freq 876.11 MHz mpn_gcd_1 mpn_gcd_11 1 12.05 #8.05 2 13.56 #8.69 3 16.23 #8.88 4 16.67 #9.37 5 20.04 #12.55 6 22.88 #16.06 7 28.21 #21.21 8 32.47 #27.34 9 34.53 #32.17 10 41.24 #38.47 11 43.65 #42.90 12 #49.50 50.26 13 #55.92 58.40 14 #57.88 61.12 15 #66.06 66.68 16 #68.26 71.43 17 #71.36 75.56 18 #77.95 81.72 19 #83.09 86.64 20 #84.53 89.62 21 #86.88 93.23 22 #97.32 98.35 23 101.75 #101.46 24 #107.63 108.32 25 #110.05 110.30 26 118.25 #114.15 27 #115.00 119.02 28 #118.93 123.26 29 #123.22 124.74 30 #127.43 131.10 31 #134.56 139.14 32 139.34 #139.21 33 142.63 #141.76 34 #145.42 147.85 35 #147.92 150.54 36 #151.31 154.07 37 #152.37 159.75 38 #155.48 162.92 39 168.28 #164.72 40 #163.92 169.57 41 177.38 #171.68 42 #183.22 185.21 43 #188.94 189.92 44 #195.21 196.79 45 198.06 #197.09 46 #200.67 201.34 47 #203.37 205.09 48 208.80 #208.16 49 #212.83 216.31 50 #215.69 217.80 51 #219.62 228.48 52 237.99 #228.64 53 222.52 #219.11 54 #219.05 222.15 55 #221.39 227.44 56 #225.41 230.09 57 #226.36 235.82 58 #231.66 242.04 59 #239.70 246.53 60 #239.07 246.82 61 #242.48 249.31 62 #244.55 254.03 63 #249.94 257.11 64 #254.20 262.47 I.e., differences of just a few cycles, but disappointingly not always in favor of the new gcd_11. If we fix the size of one of the operands, the difference due to initial reduction in gcd_1 is very visible: $ ./tune/speed -p 100000 -c -s 1-64 mpn_gcd_1.10 mpn_gcd_11.10 overhead 5.01 cycles, precision 100000 units of 9.00e-10 secs, CPU freq 1111.62 MHz mpn_gcd_1.10 mpn_gcd_11.10 1 48.56 #31.68 2 47.65 #30.24 3 48.78 #33.06 4 49.02 #33.15 5 50.00 #31.07 6 48.74 #34.44 7 52.14 #35.24 8 52.51 #37.61 9 46.81 #39.10 10 56.40 #41.97 11 59.66 #42.12 12 62.68 #47.73 13 71.72 #49.63 14 84.10 #54.56 15 102.05 #57.25 16 117.86 #63.12 17 125.77 #58.76 18 120.16 #63.45 19 123.74 #66.41 20 145.42 #82.43 21 143.24 #83.48 22 147.30 #86.79 23 139.39 #81.41 24 126.56 #84.67 25 124.91 #86.97 26 126.57 #88.07 27 149.59 #102.50 28 146.80 #109.59 29 146.96 #103.42 30 140.75 #101.46 31 127.92 #103.95 32 126.41 #108.93 33 126.68 #112.32 34 142.09 #126.03 35 143.65 #130.04 36 142.99 #130.96 37 143.80 #131.02 38 127.91 #126.18 39 #127.76 129.48 40 #128.44 130.90 41 151.79 #142.34 42 #142.43 145.96 43 #142.62 146.64 44 #128.56 145.52 45 #129.30 150.46 46 #129.68 155.30 47 #129.07 159.03 48 #132.37 158.59 49 #126.59 158.04 50 #128.14 161.57 51 #129.74 164.20 52 #127.63 166.14 53 #130.12 170.52 54 #130.41 175.19 55 #133.91 173.54 56 #129.01 176.41 57 #127.91 178.71 58 #126.03 181.28 59 #129.15 185.43 60 #130.57 188.86 61 #130.14 190.29 62 #127.46 194.79 63 #129.22 196.25 64 #127.74 202.10 > Speaking of gcd_22. We need to determine this function's interface. I've been considering void mpn_gcd_22 (mp_ptr rp, mp_limb_t u1, mp_limb_t u0, mp_limb_t v1, mp_limb_t v0); with the requirement that u0 and v0 are odd. Do you prefer something different? I'm thinking that it should not be a public function. > The last loop will be 11. We can simply inline a copy here as it is > tiny. (A tail call won't work as the functions will have different > return types.) Except if we have lots of tuned variants of gcd_11 and fewer gcd_22; then gcd_22 ought to call the separately selected gcd_11. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel