On 05/16/2011 03:15 PM, Måns Rullgård wrote:
> Justin Ruggles <[email protected]> writes:
>
>> On 05/15/2011 04:49 AM, Måns Rullgård wrote:
>>
>>> Justin Ruggles <[email protected]> writes:
>>>
>>>> On 05/14/2011 10:50 AM, Måns Rullgård wrote:
>>>>
>>>>> Diego Biurrun <[email protected]> writes:
>>>>>
>>>>>> On Sat, May 14, 2011 at 09:41:01AM +0100, Måns Rullgård wrote:
>>>>>>> Justin Ruggles <[email protected]> writes:
>>>>>>>
>>>>>>>> This does all the actual bit counting as a final step.
>>>>>>>> x86 benchmarks:
>>>>>>>> 50% faster in function count_mantissa_bits()
>>>>>>>> 16% faster in function bit_alloc()
>>>>>>>> ---
>>>>>>>> libavcodec/ac3dsp.c | 33 ++++++++--------
>>>>>>>> libavcodec/ac3dsp.h | 4 +-
>>>>>>>> libavcodec/ac3enc.c | 78
>>>>>>>> +++++++++++++++++++++-----------------
>>>>>>>> libavcodec/arm/Makefile | 1 -
>>>>>>>> libavcodec/arm/ac3dsp_arm.S | 52 -------------------------
>>>>>>>> libavcodec/arm/ac3dsp_init_arm.c | 2 -
>>>>>>>> 6 files changed, 63 insertions(+), 107 deletions(-)
>>>>>>>> delete mode 100644 libavcodec/arm/ac3dsp_arm.S
>>>>>>>> +static void count_mantissa_bits_update_ch(AC3EncodeContext *s, int ch,
>>>>>>>> + uint16_t
>>>>>>>> mant_cnt[AC3_MAX_BLOCKS][16],
>>>>>>>> + int start, int end)
>>>>>>>> +{
>>>>>>>> + int blk, i;
>>>>>>>> +
>>>>>>>> + for (blk = 0; blk < AC3_MAX_BLOCKS; blk++) {
>>>>>>>> + uint8_t *bap = s->blocks[blk].exp_ref_block[ch]->bap[ch];
>>>>>>>> + for (i = start; i < end; i++)
>>>>>>>> + mant_cnt[blk][bap[i]]++;
>>>>>>>
>>>>>>> This loop will suck with gcc on ARM.
>>>>>>
>>>>>> I'm curious as to why, could you elaborate?
>>>>>
>>>>> Because gcc sucks, what else? This particular suckage was the main
>>>>> reason for writing that function assembler at all.
>>>>
>>>> Could this be written in asm for ARM then?
>>>
>>> If the code is reorganised to allow this, yes.
>>
>> Would it help to just have the inner loop in asm?
>
> The outer loop looks simple enough to write in asm too. The pointer
> chasing is a bit worrisome though. Is there any way to flatten some of
> that into an array instead?
We could flatten bap into an array, and reset_block_bap() could be
modified to set the pointers based on reference blocks. Then we would have:
for (blk = 0; blk < AC3_MAX_BLOCKS; blk++) {
uint8_t *bap = s->ref_bap[ch][blk];
for (i = start; i < end; i++)
mant_cnt[blk][bap[i]]++;
}
-Justin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel