On Tue, Jan 5, 2016 at 10:10 AM, Daniel Serpell <dserp...@gmail.com> wrote: > Hi!, > > El Tue, Jan 05, 2016 at 08:08:35AM -0800, Ganesh Ajjanagadde escribio: >> On Tue, Jan 5, 2016 at 7:44 AM, Daniel Serpell <dserp...@gmail.com> wrote: >> > Hi!, >> > >> > El Mon, Jan 04, 2016 at 06:33:59PM -0800, Ganesh Ajjanagadde escribio: >> >> This exploits an approach based on the sieve of Eratosthenes, a popular >> >> method for generating prime numbers. >> >> >> >> Tables are identical to previous ones. >> >> >> >> Tested with FATE with/without --enable-hardcoded-tables. >> >> >> >> Sample benchmark (Haswell, GNU/Linux+gcc): >> >> prev: >> >> 7860100 decicycles in cbrt_tableinit, 1 runs, 0 skips >> >> 7777490 decicycles in cbrt_tableinit, 2 runs, 0 skips >> >> [...] >> >> 7582339 decicycles in cbrt_tableinit, 256 runs, 0 skips >> >> 7563556 decicycles in cbrt_tableinit, 512 runs, 0 skips >> >> >> >> new: >> >> 2099480 decicycles in cbrt_tableinit, 1 runs, 0 skips >> >> 2044470 decicycles in cbrt_tableinit, 2 runs, 0 skips >> >> [...] >> >> 1796544 decicycles in cbrt_tableinit, 256 runs, 0 skips >> >> 1791631 decicycles in cbrt_tableinit, 512 runs, 0 skips >> >> >> > >> > See attached code, function "test1", based on an approximation of: >> > >> > (i+1)^(1/3) ~= i^(1/3) * ( 1 + 1/(3i) - 1/(9i) + 5/(81i) - .... ) >> >> I assume 1/(3i), 1/(9i^2), etc obtained via a Taylor series at x = 0. >> > > Yes, more specifically (in wxmaxima): > > (%i1) at( taylor((i+x)^(1/3)/i^(1/3), x, 0, 6) , x=1 ); > > 1 1 5 10 22 154 > (%o1) 1 + --- - ---- + ----- - ------ + ------ - ------- > 3 i 2 3 4 5 6 > 9 i 81 i 243 i 729 i 6561 i > > > I noticed that if you do a change of variable, the series simplifies: > > (%i2) at( taylor((i+x)^(1/3)/i^(1/3), x, 0, 6) , [ x=1, i = (1/3)/j ] ); > > 6 5 4 3 > 154 j 22 j 10 j 5 j 2 > (%o2) (- ------) + ----- - ----- + ---- - j + j + 1 > 9 3 3 3 > >> > >> > Generated values are the same as original floats (max error in double >> > is < 4*10^-10), it is faster (and I think, simpler) than your version. >> >> Had thought of these ideas, but did not examine as I was a little >> concerned about accuracy. Thanks, will give it a spin. Or >> alternatively, you can submit a patch since you put it into action. >> > > Best if you make the patch, as you can test the speed in your same setup.
Ok, I will still make you the author though. > >> Alternatively, one could directly expand the series for (i+1)^(4/3). > > Yes, but the first two coefficients are not 1 anymore, so it needs one > more multiplication, canceling the advantage: > > (%i3) at( taylor((i+x)^(4/3)/i^(4/3), x, 0, 6) , [ x=1, i = (4/3)/j ] ); > > 6 5 4 3 2 > 11 j j 5 j j j > (%o3) ----- - --- + ---- - -- + -- + j + 1 > 9216 384 768 48 8 Ok, thanks. > > >> And it may be possible to tighten the number of terms needed by >> expanding not about x = 0, but x = i to get i+1. Or fancier polynomial >> approximations can be used. Have you tried these? > > I think that this is what I'm already doing. As I said, the slowest part > is the first division ( r = (1.0/3.0) / i ), and I can't think of any way > to avoid it. I meant Chebyshev approximation obtained e.g via Remez; it may allow shaving the degree by one, but I can't think of a simple way to get rid of the divide. Also, coefficients will likely not be 1 anymore for leading terms. > > I altered the last constants to reduce the error, but stopped trying after > getting better than 10^-10 absolute error, that is higher than the > precision of a float. This is the main thing I slightly did not like: the perturbations are not documented and they were done in an ad-hoc way. Overall though, good work! Will do some more testing and post a patch tonight. > > Daniel. > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel