aacsbr: Add comment about possibly optimization in sbr_dequant()

Ganesh Ajjanagadde Sat, 12 Dec 2015 14:25:22 -0800

On Sat, Dec 12, 2015 at 1:17 PM, Michael Niedermayer <michae...@gmx.at> wrote:
[...]


>> >> >
>> >> > The exp2f expressions are:
>> >> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f);
>> >> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha);
>> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1);
>> >> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]);
>> >> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f);
>> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]);
>> >> >
>> >> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6.
>> >> > After patch 3 of this series, env_facs_q is in the range from 0 to 127 
>> >> > and
>> >> > noise_facs_q is already limited to the range from 0 to 30.
>> >> >
>> >> > So x should always be in the range -300..300, or so.
>> >>
>> >> Very good, thanks a lot.
>> >>
>> >> Based on the above range, my idea is to not even use a LUT, but use
>> >> something like exp2fi followed by multiplication by M_SQRT2 depending
>> >> on even or odd.
>> >
>> > conditional operations can due to branch misprediction be potentially
>> > rather slow
>>
>> I think it will still be far faster than exp2f, and in the absence of
>> hard numbers, I view this a far better approach than a large (~300
>> element) lut. Of course, the proof and extent of this will need to
>> wait for actual benches.
>
> alternatively one could do a
> if (x+A < (unsigned)B)
>     LUT[x+A]
> else
>     exp2whatever(x)
>
> the range in practice should be much smaller than +-300

That still uses a branch, so unless for whatever reason the numbers
tend to concentrate in an interval (which you believe but I am
agnostic about since I don't know AAC), this is code complexity for
little gain. Furthermore, that B then becomes a "voodoo" constant, and
performance may vary across files depending on the degree of
concentration of the inputs. I personally don't like such things
unless they are very well justified after all other easy, uniform
methods of optimization are exhausted.

>
> also the LUT can possibly be shared between codecs

This is something for which there is plenty of low hanging fruit
across FFmpeg. The best example I know of is the sin/cos tables used
across dct, fft, and other areas: a lot of them can be derived from
sin(2*pi*i/65536) for 0 <= i <= 65536/4, and cheap runtime derivation
via indexing and symmetry exploitation (eg 16*i, 32*i, flip the sign).
I will work on this only after other things are sorted out; so in the
meantime it is up for grabs.

Complications come from threading issues for the case of dynamic init,
since one needs to place a lock of some sort to avoid write
contention. And --enable-hardcoded-tables does not help at all, as it
distracts from the key issues since one needs to reason about both
cases.

Unfortunately, the question of static vs dynamic tables is something
that can easily get bikeshedded to death. I am trying to do
improvements to the default dynamic initialization to help resolve
such things, see e.g the aac_tablegen work. My goal is to first
improve the runtime generation code, and once it is done, ask the ML
for decisions on this when the answer is relatively clear cut.

I might be too optimistic here regarding removal of
--enable-hardcoded-tables entirely via hard decisions in respective
usages of static vs runtime across the codebase. This is beacuse
things like mpegaudio_tablegen are hard to do on the order of 10^5 or
below cycles at runtime, leading to ambiguity and stalemating the
removal of hardcoded tables hackery.

>
> or that code could be in a exp_sqrt2i() or something

Maybe, I just thought by keeping common code, one avoids essentially
duplicate functions and it possibly keeps binary size smaller. This of
course depends on the linker, header vs source implementation, etc and
is a minor consideration.

>
> just some random ideas...

Thanks for the ideas. I had thought about most of these things before
settling on the idea proposed above. In any case, the goal should be
solid incremental improvement: last mile tweaks to get rid of some
branch penalties, optimize for common inputs, or other things can
always be done later, and should not bog down work now.

>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Complexity theory is the science of finding the exact solution to an
> approximation. Benchmarking OTOH is finding an approximation of the exact
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

Reply via email to