On 05/16/2011 03:15 PM, Måns Rullgård wrote:

> Justin Ruggles <[email protected]> writes:
> 
>> On 05/15/2011 04:49 AM, Måns Rullgård wrote:
>>
>>> Justin Ruggles <[email protected]> writes:
>>>
>>>> On 05/14/2011 10:50 AM, Måns Rullgård wrote:
>>>>
>>>>> Diego Biurrun <[email protected]> writes:
>>>>>
>>>>>> On Sat, May 14, 2011 at 09:41:01AM +0100, Måns Rullgård wrote:
>>>>>>> Justin Ruggles <[email protected]> writes:
>>>>>>>
>>>>>>>> This does all the actual bit counting as a final step.
>>>>>>>> x86 benchmarks:
>>>>>>>> 50% faster in function count_mantissa_bits()
>>>>>>>> 16% faster in function bit_alloc()
>>>>>>>> ---
>>>>>>>>  libavcodec/ac3dsp.c              |   33 ++++++++--------
>>>>>>>>  libavcodec/ac3dsp.h              |    4 +-
>>>>>>>>  libavcodec/ac3enc.c              |   78 
>>>>>>>> +++++++++++++++++++++-----------------
>>>>>>>>  libavcodec/arm/Makefile          |    1 -
>>>>>>>>  libavcodec/arm/ac3dsp_arm.S      |   52 -------------------------
>>>>>>>>  libavcodec/arm/ac3dsp_init_arm.c |    2 -
>>>>>>>>  6 files changed, 63 insertions(+), 107 deletions(-)
>>>>>>>>  delete mode 100644 libavcodec/arm/ac3dsp_arm.S
>>>>>>>> +static void count_mantissa_bits_update_ch(AC3EncodeContext *s, int ch,
>>>>>>>> +                                          uint16_t 
>>>>>>>> mant_cnt[AC3_MAX_BLOCKS][16],
>>>>>>>> +                                          int start, int end)
>>>>>>>> +{
>>>>>>>> +    int blk, i;
>>>>>>>> +
>>>>>>>> +    for (blk = 0; blk < AC3_MAX_BLOCKS; blk++) {
>>>>>>>> +        uint8_t *bap = s->blocks[blk].exp_ref_block[ch]->bap[ch];
>>>>>>>> +        for (i = start; i < end; i++)
>>>>>>>> +            mant_cnt[blk][bap[i]]++;
>>>>>>>
>>>>>>> This loop will suck with gcc on ARM.
>>>>>>
>>>>>> I'm curious as to why, could you elaborate?
>>>>>
>>>>> Because gcc sucks, what else?  This particular suckage was the main
>>>>> reason for writing that function assembler at all.
>>>>
>>>> Could this be written in asm for ARM then?
>>>
>>> If the code is reorganised to allow this, yes.
>>
>> Would it help to just have the inner loop in asm?
> 
> The outer loop looks simple enough to write in asm too.  The pointer
> chasing is a bit worrisome though.  Is there any way to flatten some of
> that into an array instead?


We could flatten bap into an array, and reset_block_bap() could be
modified to set the pointers based on reference blocks.  Then we would have:

for (blk = 0; blk < AC3_MAX_BLOCKS; blk++) {
    uint8_t *bap = s->ref_bap[ch][blk];
    for (i = start; i < end; i++)
        mant_cnt[blk][bap[i]]++;
}

-Justin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to