Hi I have implemented a faster integer root extraction based on
"Detecting Perfect Powers in Essentially Linear Time" Daniel J Bernstein http://cr.yp.to/papers.html The current root extraction always returns a remainder as well , but with the new algorithms it is faster not to calculate it for example ./speed -c -s 1-1000000 -f 1.6 mpn_root_new.3 mpn_rootrem_new.3 -r1 overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq 2664.77 MHz mpn_root_new.3 mpn_rootrem_new.3 1 #3694.89 1.1008 2 #4284.75 1.0678 3 #4430.08 1.1121 4 #4662.45 1.1732 6 #5726.39 1.1322 9 #6017.92 1.1127 14 #6978.72 1.1228 22 #8152.16 1.1638 35 #10558.69 1.2500 56 #15334.99 1.3346 89 #25025.73 1.3784 142 #46958.37 1.4195 227 #86848.08 1.5167 363 #174570.33 1.4936 580 #360345.67 1.5261 928 #731964.50 1.4979 1484 #1460732.00 1.4964 2374 #2918845.00 1.4811 3798 #5860142.00 1.4667 6076 #11373722.00 1.4723 9721 #21570835.00 1.3570 15553 #41697982.00 1.3366 24884 #75745224.00 1.2987 39814 #130672237.00 1.2807 63702 #234201455.00 1.2879 101923 #452620533.00 1.2303 163076 #735142933.00 1.2382 260921 #1259932350.00 1.2596 417473 #2466162074.00 1.1973 667956 #4040044473.15 1.2111 this is comparing the new root against the new root with remainder , and you can see that for cube roots we save 20% by not calculating the remainder ./speed -c -s 1-1000000 -f 1.6 mpn_root_new.11 mpn_rootrem_new.11 -r1 overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq 2664.77 MHz mpn_root_new.11 mpn_rootrem_new.11 1 #3471.90 1.0631 2 #5024.17 1.0491 3 #4921.21 1.0453 4 #4866.52 1.0658 6 #5505.43 1.1002 9 #4988.69 1.1495 14 #5568.22 1.1735 22 #7983.31 1.1400 35 #7774.96 1.3388 56 #9551.65 1.4950 89 #12238.06 1.8159 142 #15696.17 2.3840 227 #23076.17 3.1835 363 #39873.64 3.7148 580 #72337.19 4.1317 928 #143198.29 4.3477 1484 #288703.75 4.2155 2374 #588909.50 4.2622 3798 #1207692.00 4.1712 6076 #2393227.00 3.9869 9721 #4860592.00 3.7902 15553 #9635550.00 3.5641 24884 #18805253.00 3.4424 39814 #35785425.00 2.6833 63702 #65366705.00 2.6164 101923 #114183384.00 2.4796 163076 #202749025.00 2.5899 260921 #348500080.00 2.5154 417473 #637235393.00 2.3188 667956 #1015214092.00 2.5943 and for 11-th roots it is over twice as fast to not calculate the remainder , and it only gets better (ie 1001-th root is 100x speedup) How does this compare with the existing ./speed -c -s 1-1000000 -f 1.6 mpn_rootrem.3 mpn_rootrem_new.3 -r2 overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq 2664.77 MHz mpn_rootrem.3 mpn_rootrem_new.3 1 #0.5070 3864.95 2 #0.5938 4411.65 3 #0.2542 10696.65 4 #0.4086 11228.18 6 #0.4216 11841.79 9 #0.6229 12861.69 14 1.0253 #13613.28 22 1.4994 #15225.00 35 2.3086 #19700.35 56 2.8291 #33357.22 89 3.9932 #48141.96 142 5.1187 #86683.00 227 6.3146 #165246.14 363 6.6329 #328079.25 580 7.4431 #680612.00 928 8.1024 #1376246.00 1484 8.2181 #2731967.00 2374 8.8078 #5345203.00 3798 9.3735 #8587304.00 6076 9.3773 #16715719.00 9721 10.4492 #29254880.00 15553 11.2773 #55684068.00 24884 12.0694 #98372702.00 39814 12.3675 #167308412.00 63702 13.0026 #301692586.00 101923 13.9266 #558760850.00 163076 14.4037 #912504587.00 260921 14.9693 #1586573976.00 417473 15.6151 #2963410773.90 667956 16.4052 #4892824168.55 so for cube roots the new algorithm is up to 16x speedup ./speed -c -s 1-1000000 -f 1.6 mpn_rootrem.6 mpn_rootrem_new.6 -r2 overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq 2664.77 MHz mpn_rootrem.6 mpn_rootrem_new.6 1 #0.4230 4015.07 2 #0.4826 5001.39 3 #0.4489 6002.74 4 #0.5826 5972.63 6 #0.5538 6597.57 9 #0.9411 6649.87 14 1.2604 #8589.01 22 1.7959 #9131.34 35 2.8297 #11595.51 56 4.9101 #15742.77 89 6.7737 #24455.11 142 8.9538 #43350.88 227 11.0893 #80233.38 363 12.0200 #165172.43 580 13.9284 #325329.50 928 14.8314 #662709.50 1484 14.7727 #1355535.00 2374 15.3892 #2732746.00 3798 16.4676 #5320714.00 6076 16.5480 #10111893.00 9721 18.4109 #19234270.00 15553 19.6854 #37076658.00 24884 16.5990 #64584267.00 39814 16.0566 #109209274.00 63702 16.0017 #203845446.00 101923 16.8694 #363341873.00 163076 17.8948 #571863523.00 260921 18.1657 #1043338924.00 417473 18.3265 #1885156335.00 667956 19.3345 #3123304968.21 and for 6-th roots we are 20x speedup ./speed -c -s 1-1000000 -f 1.6 mpn_rootrem.1532 mpn_root_new.1532 -r2 overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq 2664.77 MHz mpn_rootrem.1532 mpn_root_new.1532 1 #0.0374 3262.44 2 #0.0170 3410.33 3 #0.0188 3269.75 4 #0.0200 3155.01 6 #0.0204 3214.03 9 #0.0225 3264.04 14 #0.0266 3329.49 22 #0.0345 3226.34 35 1.0942 #3566.89 56 3.7086 #3876.63 89 8.4034 #4131.03 142 15.7969 #6270.49 227 40.9515 #8352.82 363 94.7907 #10958.85 580 171.0272 #12884.93 928 307.8009 #15113.47 1484 711.6705 #14059.60 2374 1277.3062 #16899.00 3798 2546.8512 #16829.41 6076 4526.9584 #17086.22 9721 5806.8554 #25465.00 15553 9123.4035 #31556.31 24884 10879.8868 #47096.42 39814 10780.5283 #80652.07 63702 12266.8911 #145376.57 101923 10941.7488 #287397.00 163076 9076.3009 #591927.50 260921 8849.2853 #1269636.00 417473 8200.9631 #2512545.00 667956 7340.9630 #5057633.00 and just to show off , the 1532th root of a 4 million bit integer is 12 thousand faster than existing if we dont need the remainder. Jason --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---
