Justin Ruggles <[email protected]> writes:

> On 05/16/2011 03:15 PM, Måns Rullgård wrote:
>
>> Justin Ruggles <[email protected]> writes:
>> 
>>> On 05/15/2011 04:49 AM, Måns Rullgård wrote:
>>>
>>>> Justin Ruggles <[email protected]> writes:
>>>>
>>>>> On 05/14/2011 10:50 AM, Måns Rullgård wrote:
>>>>>
>>>>>> Diego Biurrun <[email protected]> writes:
>>>>>>
>>>>>>> On Sat, May 14, 2011 at 09:41:01AM +0100, Måns Rullgård wrote:
>>>>>>>> Justin Ruggles <[email protected]> writes:
>>>>>>>>
>>>>>>>>> This does all the actual bit counting as a final step.
>>>>>>>>> x86 benchmarks:
>>>>>>>>> 50% faster in function count_mantissa_bits()
>>>>>>>>> 16% faster in function bit_alloc()
>>>>>>>>> ---
>>>>>>>>>  libavcodec/ac3dsp.c              |   33 ++++++++--------
>>>>>>>>>  libavcodec/ac3dsp.h              |    4 +-
>>>>>>>>>  libavcodec/ac3enc.c              |   78 
>>>>>>>>> +++++++++++++++++++++-----------------
>>>>>>>>>  libavcodec/arm/Makefile          |    1 -
>>>>>>>>>  libavcodec/arm/ac3dsp_arm.S      |   52 -------------------------
>>>>>>>>>  libavcodec/arm/ac3dsp_init_arm.c |    2 -
>>>>>>>>>  6 files changed, 63 insertions(+), 107 deletions(-)
>>>>>>>>>  delete mode 100644 libavcodec/arm/ac3dsp_arm.S
>>>>>>>>> +static void count_mantissa_bits_update_ch(AC3EncodeContext *s, int 
>>>>>>>>> ch,
>>>>>>>>> +                                          uint16_t 
>>>>>>>>> mant_cnt[AC3_MAX_BLOCKS][16],
>>>>>>>>> +                                          int start, int end)
>>>>>>>>> +{
>>>>>>>>> +    int blk, i;
>>>>>>>>> +
>>>>>>>>> +    for (blk = 0; blk < AC3_MAX_BLOCKS; blk++) {
>>>>>>>>> +        uint8_t *bap = s->blocks[blk].exp_ref_block[ch]->bap[ch];
>>>>>>>>> +        for (i = start; i < end; i++)
>>>>>>>>> +            mant_cnt[blk][bap[i]]++;
>>>>>>>>
>>>>>>>> This loop will suck with gcc on ARM.
>>>>>>>
>>>>>>> I'm curious as to why, could you elaborate?
>>>>>>
>>>>>> Because gcc sucks, what else?  This particular suckage was the main
>>>>>> reason for writing that function assembler at all.
>>>>>
>>>>> Could this be written in asm for ARM then?
>>>>
>>>> If the code is reorganised to allow this, yes.
>>>
>>> Would it help to just have the inner loop in asm?
>> 
>> The outer loop looks simple enough to write in asm too.  The pointer
>> chasing is a bit worrisome though.  Is there any way to flatten some of
>> that into an array instead?
>
> We could flatten bap into an array, and reset_block_bap() could be
> modified to set the pointers based on reference blocks. 

Would there be any downside to doing that, such as overhead in that
function instead of here?

> Then we would have:
>
> for (blk = 0; blk < AC3_MAX_BLOCKS; blk++) {
>     uint8_t *bap = s->ref_bap[ch][blk];
>     for (i = start; i < end; i++)
>         mant_cnt[blk][bap[i]]++;
> }

This looks more asm-friendly.

-- 
Måns Rullgård
[email protected]
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to