I have uploaded the new root code , and it should be there in an hour
or so.
I have changed the syntax of the undocumented  internal function
mpn_rootrem to suit the new code better.
The return value of this function is now only defined in the case when
you calculate the remainder.
I will adjust the two mpz functions that use this and introduce a new
mpz function to take advantage of this feature.

Jason

On Aug 26, 11:44 pm, Jason Moxham <[email protected]> wrote:
> Hi
>
> I have implemented a faster integer root extraction based on
>
> "Detecting Perfect Powers in Essentially Linear Time"
> Daniel J Bernstein  http://cr.yp.to/papers.html
>
> The current root extraction always returns a remainder as well , but with the
> new algorithms it  is faster not to calculate it  for example
>
> ./speed -c -s 1-1000000 -f 1.6 mpn_root_new.3 mpn_rootrem_new.3 -r1
> overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq
> 2664.77 MHz
>         mpn_root_new.3 mpn_rootrem_new.3
> 1            #3694.89        1.1008
> 2            #4284.75        1.0678
> 3            #4430.08        1.1121
> 4            #4662.45        1.1732
> 6            #5726.39        1.1322
> 9            #6017.92        1.1127
> 14           #6978.72        1.1228
> 22           #8152.16        1.1638
> 35          #10558.69        1.2500
> 56          #15334.99        1.3346
> 89          #25025.73        1.3784
> 142         #46958.37        1.4195
> 227         #86848.08        1.5167
> 363        #174570.33        1.4936
> 580        #360345.67        1.5261
> 928        #731964.50        1.4979
> 1484      #1460732.00        1.4964
> 2374      #2918845.00        1.4811
> 3798      #5860142.00        1.4667
> 6076     #11373722.00        1.4723
> 9721     #21570835.00        1.3570
> 15553    #41697982.00        1.3366
> 24884    #75745224.00        1.2987
> 39814   #130672237.00        1.2807
> 63702   #234201455.00        1.2879
> 101923  #452620533.00        1.2303
> 163076  #735142933.00        1.2382
> 260921  #1259932350.00        1.2596
> 417473  #2466162074.00        1.1973
> 667956  #4040044473.15        1.2111
>
> this is comparing the new root against the new root with remainder  , and you
> can see that for cube roots we save 20% by not calculating the remainder
>
> ./speed -c -s 1-1000000 -f 1.6 mpn_root_new.11 mpn_rootrem_new.11 -r1
> overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq
> 2664.77 MHz
>         mpn_root_new.11 mpn_rootrem_new.11
> 1            #3471.90        1.0631
> 2            #5024.17        1.0491
> 3            #4921.21        1.0453
> 4            #4866.52        1.0658
> 6            #5505.43        1.1002
> 9            #4988.69        1.1495
> 14           #5568.22        1.1735
> 22           #7983.31        1.1400
> 35           #7774.96        1.3388
> 56           #9551.65        1.4950
> 89          #12238.06        1.8159
> 142         #15696.17        2.3840
> 227         #23076.17        3.1835
> 363         #39873.64        3.7148
> 580         #72337.19        4.1317
> 928        #143198.29        4.3477
> 1484       #288703.75        4.2155
> 2374       #588909.50        4.2622
> 3798      #1207692.00        4.1712
> 6076      #2393227.00        3.9869
> 9721      #4860592.00        3.7902
> 15553     #9635550.00        3.5641
> 24884    #18805253.00        3.4424
> 39814    #35785425.00        2.6833
> 63702    #65366705.00        2.6164
> 101923  #114183384.00        2.4796
> 163076  #202749025.00        2.5899
> 260921  #348500080.00        2.5154
> 417473  #637235393.00        2.3188
> 667956  #1015214092.00        2.5943
>
> and for 11-th roots it is over twice as fast to not calculate the remainder ,
> and it only gets better (ie 1001-th root is 100x speedup)
>
> How does this compare with the existing
> ./speed -c -s 1-1000000 -f 1.6 mpn_rootrem.3 mpn_rootrem_new.3 -r2
> overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq
> 2664.77 MHz
>         mpn_rootrem.3 mpn_rootrem_new.3
> 1             #0.5070       3864.95
> 2             #0.5938       4411.65
> 3             #0.2542      10696.65
> 4             #0.4086      11228.18
> 6             #0.4216      11841.79
> 9             #0.6229      12861.69
> 14             1.0253     #13613.28
> 22             1.4994     #15225.00
> 35             2.3086     #19700.35
> 56             2.8291     #33357.22
> 89             3.9932     #48141.96
> 142            5.1187     #86683.00
> 227            6.3146    #165246.14
> 363            6.6329    #328079.25
> 580            7.4431    #680612.00
> 928            8.1024   #1376246.00
> 1484           8.2181   #2731967.00
> 2374           8.8078   #5345203.00
> 3798           9.3735   #8587304.00
> 6076           9.3773  #16715719.00
> 9721          10.4492  #29254880.00
> 15553         11.2773  #55684068.00
> 24884         12.0694  #98372702.00
> 39814         12.3675 #167308412.00
> 63702         13.0026 #301692586.00
> 101923        13.9266 #558760850.00
> 163076        14.4037 #912504587.00
> 260921        14.9693 #1586573976.00
> 417473        15.6151 #2963410773.90
> 667956        16.4052 #4892824168.55
>
> so for cube roots the new algorithm is up to 16x speedup
>
> ./speed -c -s 1-1000000 -f 1.6 mpn_rootrem.6 mpn_rootrem_new.6 -r2
> overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq
> 2664.77 MHz
>         mpn_rootrem.6 mpn_rootrem_new.6
> 1             #0.4230       4015.07
> 2             #0.4826       5001.39
> 3             #0.4489       6002.74
> 4             #0.5826       5972.63
> 6             #0.5538       6597.57
> 9             #0.9411       6649.87
> 14             1.2604      #8589.01
> 22             1.7959      #9131.34
> 35             2.8297     #11595.51
> 56             4.9101     #15742.77
> 89             6.7737     #24455.11
> 142            8.9538     #43350.88
> 227           11.0893     #80233.38
> 363           12.0200    #165172.43
> 580           13.9284    #325329.50
> 928           14.8314    #662709.50
> 1484          14.7727   #1355535.00
> 2374          15.3892   #2732746.00
> 3798          16.4676   #5320714.00
> 6076          16.5480  #10111893.00
> 9721          18.4109  #19234270.00
> 15553         19.6854  #37076658.00
> 24884         16.5990  #64584267.00
> 39814         16.0566 #109209274.00
> 63702         16.0017 #203845446.00
> 101923        16.8694 #363341873.00
> 163076        17.8948 #571863523.00
> 260921        18.1657 #1043338924.00
> 417473        18.3265 #1885156335.00
> 667956        19.3345 #3123304968.21
>
> and for 6-th roots we are 20x speedup
>
> ./speed -c -s 1-1000000 -f 1.6 mpn_rootrem.1532 mpn_root_new.1532 -r2
> overhead 6.67 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq
> 2664.77 MHz
>         mpn_rootrem.1532 mpn_root_new.1532
> 1             #0.0374       3262.44
> 2             #0.0170       3410.33
> 3             #0.0188       3269.75
> 4             #0.0200       3155.01
> 6             #0.0204       3214.03
> 9             #0.0225       3264.04
> 14            #0.0266       3329.49
> 22            #0.0345       3226.34
> 35             1.0942      #3566.89
> 56             3.7086      #3876.63
> 89             8.4034      #4131.03
> 142           15.7969      #6270.49
> 227           40.9515      #8352.82
> 363           94.7907     #10958.85
> 580          171.0272     #12884.93
> 928          307.8009     #15113.47
> 1484         711.6705     #14059.60
> 2374        1277.3062     #16899.00
> 3798        2546.8512     #16829.41
> 6076        4526.9584     #17086.22
> 9721        5806.8554     #25465.00
> 15553       9123.4035     #31556.31
> 24884      10879.8868     #47096.42
> 39814      10780.5283     #80652.07
> 63702      12266.8911    #145376.57
> 101923     10941.7488    #287397.00
> 163076      9076.3009    #591927.50
> 260921      8849.2853   #1269636.00
> 417473      8200.9631   #2512545.00
> 667956      7340.9630   #5057633.00
>
> and just to show off , the 1532th root of a 4 million bit integer is 12
> thousand faster than existing if we dont need the remainder.
>
> Jason
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to