On Sat, Dec 12, 2015 at 1:17 PM, Michael Niedermayer <michae...@gmx.at> wrote: [...]
>> >> > >> >> > The exp2f expressions are: >> >> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f); >> >> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha); >> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1); >> >> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]); >> >> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f); >> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]); >> >> > >> >> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6. >> >> > After patch 3 of this series, env_facs_q is in the range from 0 to 127 >> >> > and >> >> > noise_facs_q is already limited to the range from 0 to 30. >> >> > >> >> > So x should always be in the range -300..300, or so. >> >> >> >> Very good, thanks a lot. >> >> >> >> Based on the above range, my idea is to not even use a LUT, but use >> >> something like exp2fi followed by multiplication by M_SQRT2 depending >> >> on even or odd. >> > >> > conditional operations can due to branch misprediction be potentially >> > rather slow >> >> I think it will still be far faster than exp2f, and in the absence of >> hard numbers, I view this a far better approach than a large (~300 >> element) lut. Of course, the proof and extent of this will need to >> wait for actual benches. > > alternatively one could do a > if (x+A < (unsigned)B) > LUT[x+A] > else > exp2whatever(x) > > the range in practice should be much smaller than +-300 That still uses a branch, so unless for whatever reason the numbers tend to concentrate in an interval (which you believe but I am agnostic about since I don't know AAC), this is code complexity for little gain. Furthermore, that B then becomes a "voodoo" constant, and performance may vary across files depending on the degree of concentration of the inputs. I personally don't like such things unless they are very well justified after all other easy, uniform methods of optimization are exhausted. > > also the LUT can possibly be shared between codecs This is something for which there is plenty of low hanging fruit across FFmpeg. The best example I know of is the sin/cos tables used across dct, fft, and other areas: a lot of them can be derived from sin(2*pi*i/65536) for 0 <= i <= 65536/4, and cheap runtime derivation via indexing and symmetry exploitation (eg 16*i, 32*i, flip the sign). I will work on this only after other things are sorted out; so in the meantime it is up for grabs. Complications come from threading issues for the case of dynamic init, since one needs to place a lock of some sort to avoid write contention. And --enable-hardcoded-tables does not help at all, as it distracts from the key issues since one needs to reason about both cases. Unfortunately, the question of static vs dynamic tables is something that can easily get bikeshedded to death. I am trying to do improvements to the default dynamic initialization to help resolve such things, see e.g the aac_tablegen work. My goal is to first improve the runtime generation code, and once it is done, ask the ML for decisions on this when the answer is relatively clear cut. I might be too optimistic here regarding removal of --enable-hardcoded-tables entirely via hard decisions in respective usages of static vs runtime across the codebase. This is beacuse things like mpegaudio_tablegen are hard to do on the order of 10^5 or below cycles at runtime, leading to ambiguity and stalemating the removal of hardcoded tables hackery. > > or that code could be in a exp_sqrt2i() or something Maybe, I just thought by keeping common code, one avoids essentially duplicate functions and it possibly keeps binary size smaller. This of course depends on the linker, header vs source implementation, etc and is a minor consideration. > > just some random ideas... Thanks for the ideas. I had thought about most of these things before settling on the idea proposed above. In any case, the goal should be solid incremental improvement: last mile tweaks to get rid of some branch penalties, optimize for common inputs, or other things can always be done later, and should not bog down work now. > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Complexity theory is the science of finding the exact solution to an > approximation. Benchmarking OTOH is finding an approximation of the exact > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel