Justin Ruggles <[email protected]> writes:
> On 05/16/2011 03:15 PM, Måns Rullgård wrote:
>
>> Justin Ruggles <[email protected]> writes:
>>
>>> On 05/15/2011 04:49 AM, Måns Rullgård wrote:
>>>
>>>> Justin Ruggles <[email protected]> writes:
>>>>
>>>>> On 05/14/2011 10:50 AM, Måns Rullgård wrote:
>>>>>
>>>>>> Diego Biurrun <[email protected]> writes:
>>>>>>
>>>>>>> On Sat, May 14, 2011 at 09:41:01AM +0100, Måns Rullgård wrote:
>>>>>>>> Justin Ruggles <[email protected]> writes:
>>>>>>>>
>>>>>>>>> This does all the actual bit counting as a final step.
>>>>>>>>> x86 benchmarks:
>>>>>>>>> 50% faster in function count_mantissa_bits()
>>>>>>>>> 16% faster in function bit_alloc()
>>>>>>>>> ---
>>>>>>>>> libavcodec/ac3dsp.c | 33 ++++++++--------
>>>>>>>>> libavcodec/ac3dsp.h | 4 +-
>>>>>>>>> libavcodec/ac3enc.c | 78
>>>>>>>>> +++++++++++++++++++++-----------------
>>>>>>>>> libavcodec/arm/Makefile | 1 -
>>>>>>>>> libavcodec/arm/ac3dsp_arm.S | 52 -------------------------
>>>>>>>>> libavcodec/arm/ac3dsp_init_arm.c | 2 -
>>>>>>>>> 6 files changed, 63 insertions(+), 107 deletions(-)
>>>>>>>>> delete mode 100644 libavcodec/arm/ac3dsp_arm.S
>>>>>>>>> +static void count_mantissa_bits_update_ch(AC3EncodeContext *s, int
>>>>>>>>> ch,
>>>>>>>>> + uint16_t
>>>>>>>>> mant_cnt[AC3_MAX_BLOCKS][16],
>>>>>>>>> + int start, int end)
>>>>>>>>> +{
>>>>>>>>> + int blk, i;
>>>>>>>>> +
>>>>>>>>> + for (blk = 0; blk < AC3_MAX_BLOCKS; blk++) {
>>>>>>>>> + uint8_t *bap = s->blocks[blk].exp_ref_block[ch]->bap[ch];
>>>>>>>>> + for (i = start; i < end; i++)
>>>>>>>>> + mant_cnt[blk][bap[i]]++;
>>>>>>>>
>>>>>>>> This loop will suck with gcc on ARM.
>>>>>>>
>>>>>>> I'm curious as to why, could you elaborate?
>>>>>>
>>>>>> Because gcc sucks, what else? This particular suckage was the main
>>>>>> reason for writing that function assembler at all.
>>>>>
>>>>> Could this be written in asm for ARM then?
>>>>
>>>> If the code is reorganised to allow this, yes.
>>>
>>> Would it help to just have the inner loop in asm?
>>
>> The outer loop looks simple enough to write in asm too. The pointer
>> chasing is a bit worrisome though. Is there any way to flatten some of
>> that into an array instead?
>
> We could flatten bap into an array, and reset_block_bap() could be
> modified to set the pointers based on reference blocks.
Would there be any downside to doing that, such as overhead in that
function instead of here?
> Then we would have:
>
> for (blk = 0; blk < AC3_MAX_BLOCKS; blk++) {
> uint8_t *bap = s->ref_bap[ch][blk];
> for (i = start; i < end; i++)
> mant_cnt[blk][bap[i]]++;
> }
This looks more asm-friendly.
--
Måns Rullgård
[email protected]
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel