Hi

I have implemented a faster integer root extraction based on

"Detecting Perfect Powers in Essentially Linear Time" 
Daniel J Bernstein  
http://cr.yp.to/papers.html

The current root extraction always returns a remainder as well , but with the 
new algorithms it  is faster not to calculate it  for example

./speed -c -s 1-1000000 -f 1.6 mpn_root_new.3 mpn_rootrem_new.3 -r1
overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq 
2664.77 MHz
        mpn_root_new.3 mpn_rootrem_new.3
1            #3694.89        1.1008
2            #4284.75        1.0678
3            #4430.08        1.1121
4            #4662.45        1.1732
6            #5726.39        1.1322
9            #6017.92        1.1127
14           #6978.72        1.1228
22           #8152.16        1.1638
35          #10558.69        1.2500
56          #15334.99        1.3346
89          #25025.73        1.3784
142         #46958.37        1.4195
227         #86848.08        1.5167
363        #174570.33        1.4936
580        #360345.67        1.5261
928        #731964.50        1.4979
1484      #1460732.00        1.4964
2374      #2918845.00        1.4811
3798      #5860142.00        1.4667
6076     #11373722.00        1.4723
9721     #21570835.00        1.3570
15553    #41697982.00        1.3366
24884    #75745224.00        1.2987
39814   #130672237.00        1.2807
63702   #234201455.00        1.2879
101923  #452620533.00        1.2303
163076  #735142933.00        1.2382
260921  #1259932350.00        1.2596
417473  #2466162074.00        1.1973
667956  #4040044473.15        1.2111

this is comparing the new root against the new root with remainder  , and you 
can see that for cube roots we save 20% by not calculating the remainder

./speed -c -s 1-1000000 -f 1.6 mpn_root_new.11 mpn_rootrem_new.11 -r1
overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq 
2664.77 MHz
        mpn_root_new.11 mpn_rootrem_new.11
1            #3471.90        1.0631
2            #5024.17        1.0491
3            #4921.21        1.0453
4            #4866.52        1.0658
6            #5505.43        1.1002
9            #4988.69        1.1495
14           #5568.22        1.1735
22           #7983.31        1.1400
35           #7774.96        1.3388
56           #9551.65        1.4950
89          #12238.06        1.8159
142         #15696.17        2.3840
227         #23076.17        3.1835
363         #39873.64        3.7148
580         #72337.19        4.1317
928        #143198.29        4.3477
1484       #288703.75        4.2155
2374       #588909.50        4.2622
3798      #1207692.00        4.1712
6076      #2393227.00        3.9869
9721      #4860592.00        3.7902
15553     #9635550.00        3.5641
24884    #18805253.00        3.4424
39814    #35785425.00        2.6833
63702    #65366705.00        2.6164
101923  #114183384.00        2.4796
163076  #202749025.00        2.5899
260921  #348500080.00        2.5154
417473  #637235393.00        2.3188
667956  #1015214092.00        2.5943

and for 11-th roots it is over twice as fast to not calculate the remainder , 
and it only gets better (ie 1001-th root is 100x speedup)

How does this compare with the existing
./speed -c -s 1-1000000 -f 1.6 mpn_rootrem.3 mpn_rootrem_new.3 -r2
overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq 
2664.77 MHz
        mpn_rootrem.3 mpn_rootrem_new.3
1             #0.5070       3864.95
2             #0.5938       4411.65
3             #0.2542      10696.65
4             #0.4086      11228.18
6             #0.4216      11841.79
9             #0.6229      12861.69
14             1.0253     #13613.28
22             1.4994     #15225.00
35             2.3086     #19700.35
56             2.8291     #33357.22
89             3.9932     #48141.96
142            5.1187     #86683.00
227            6.3146    #165246.14
363            6.6329    #328079.25
580            7.4431    #680612.00
928            8.1024   #1376246.00
1484           8.2181   #2731967.00
2374           8.8078   #5345203.00
3798           9.3735   #8587304.00
6076           9.3773  #16715719.00
9721          10.4492  #29254880.00
15553         11.2773  #55684068.00
24884         12.0694  #98372702.00
39814         12.3675 #167308412.00
63702         13.0026 #301692586.00
101923        13.9266 #558760850.00
163076        14.4037 #912504587.00
260921        14.9693 #1586573976.00
417473        15.6151 #2963410773.90
667956        16.4052 #4892824168.55

so for cube roots the new algorithm is up to 16x speedup

./speed -c -s 1-1000000 -f 1.6 mpn_rootrem.6 mpn_rootrem_new.6 -r2
overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq 
2664.77 MHz
        mpn_rootrem.6 mpn_rootrem_new.6
1             #0.4230       4015.07
2             #0.4826       5001.39
3             #0.4489       6002.74
4             #0.5826       5972.63
6             #0.5538       6597.57
9             #0.9411       6649.87
14             1.2604      #8589.01
22             1.7959      #9131.34
35             2.8297     #11595.51
56             4.9101     #15742.77
89             6.7737     #24455.11
142            8.9538     #43350.88
227           11.0893     #80233.38
363           12.0200    #165172.43
580           13.9284    #325329.50
928           14.8314    #662709.50
1484          14.7727   #1355535.00
2374          15.3892   #2732746.00
3798          16.4676   #5320714.00
6076          16.5480  #10111893.00
9721          18.4109  #19234270.00
15553         19.6854  #37076658.00
24884         16.5990  #64584267.00
39814         16.0566 #109209274.00
63702         16.0017 #203845446.00
101923        16.8694 #363341873.00
163076        17.8948 #571863523.00
260921        18.1657 #1043338924.00
417473        18.3265 #1885156335.00
667956        19.3345 #3123304968.21

and for 6-th roots we are 20x speedup

./speed -c -s 1-1000000 -f 1.6 mpn_rootrem.1532 mpn_root_new.1532 -r2
overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq 
2664.77 MHz
        mpn_rootrem.1532 mpn_root_new.1532
1             #0.0374       3262.44
2             #0.0170       3410.33
3             #0.0188       3269.75
4             #0.0200       3155.01
6             #0.0204       3214.03
9             #0.0225       3264.04
14            #0.0266       3329.49
22            #0.0345       3226.34
35             1.0942      #3566.89
56             3.7086      #3876.63
89             8.4034      #4131.03
142           15.7969      #6270.49
227           40.9515      #8352.82
363           94.7907     #10958.85
580          171.0272     #12884.93
928          307.8009     #15113.47
1484         711.6705     #14059.60
2374        1277.3062     #16899.00
3798        2546.8512     #16829.41
6076        4526.9584     #17086.22
9721        5806.8554     #25465.00
15553       9123.4035     #31556.31
24884      10879.8868     #47096.42
39814      10780.5283     #80652.07
63702      12266.8911    #145376.57
101923     10941.7488    #287397.00
163076      9076.3009    #591927.50
260921      8849.2853   #1269636.00
417473      8200.9631   #2512545.00
667956      7340.9630   #5057633.00

and just to show off , the 1532th root of a 4 million bit integer is 12 
thousand faster than existing if we dont need the remainder.

Jason

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to