I found another speedup: combine the inner transforms and pointwise mults and combine the ifft and normalisation in a cache friendly way.
This results in about a 5% speedup over much of the range. Here are the times now. bits iters mpir this gmp n w 195840 1000 1.149s 1.105s 0.997s 7 16 261120 1000 1.483s 1.415s 1.396s 7 16 391296 100 0.261s 0.248s 0.282s 8 8 521728 100 0.344s 0.315s 0.411s 8 8 782592 100 0.577s 0.539s 0.628s 9 4 1043456 100 0.706s 0.688s 0.848s 9 4 1569024 100 1.229s 1.153s 1.317s 9 8 2092032 100 1.543s 1.440s 2.765s 9 8 3127296 10 0.283s 0.266s 0.408s 11 1 4169728 10 0.357s 0.335s 0.543s 11 1 6273024 10 0.621s 0.597s 0.843s 11 2 8364032 10 0.831s 0.742s 1.156s 11 2 12539904 10 1.441s 1.394s 1.798s 12 1 16719872 1 0.230s 0.205s 0.288s 12 1 25122816 1 0.379s 0.336s 0.434s 12 2 33497088 1 0.524s 0.428s 0.646s 12 2 50245632 1 0.833s 0.693s 1.035s 13 1 66994176 1 1.596s 0.896s 1.358s 13 1 100577280 1 1.906s 1.552s 2.177s 13 2 134103040 1 2.784s 2.076s 2.984s 13 2 201129984 1 3.971s 3.158s 4.536s 14 1 268173312 1 5.146s 4.137s 5.781s 14 1 402456576 1 7.548s 6.443s 9.867s 14 2 536608768 1 9.841s 8.365s 12.71s 14 2 804913152 1 15.48s 13.29s 20.06s 15 1 1073217536 1 21.17s 17.16s 27.19s 15 1 1610219520 1 31.64s 28.60s 43.37s 15 2 2146959360 1 43.25s 37.02s 57.66s 15 2 3220340736 1 70.14s 58.09s 92.94s 16 1 4293787648 1 96.00s 74.26s 146.1s 16 1 6441566208 1 150.2s 131.1s 217.5s 16 2 8588754944 1 208.4s 175.0s 312.8s 16 2 12883132416 1 327.4s 278.6s 447.7s 17 1 17177509888 1 485.0s 360.ss 614.2s 17 1 Bill. -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.
