scpr: optimize shift loop.

James Almer Fri, 08 Sep 2017 17:02:17 -0700

On 9/8/2017 8:49 PM, Michael Niedermayer wrote:
> On Fri, Sep 08, 2017 at 06:43:06PM -0300, James Almer wrote:
>> On 9/8/2017 6:29 PM, Michael Niedermayer wrote:
>>> Speeds code up from 50sec to 15sec
>>>
>>> Fixes Timeout
>>> Fixes: 3242/clusterfuzz-testcase-5811951672229888
>>>
>>> Found-by: continuous fuzzing process 
>>> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
>>> Signed-off-by: Michael Niedermayer <[email protected]>
>>> ---
>>>  libavcodec/scpr.c | 11 ++++++++++-
>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/libavcodec/scpr.c b/libavcodec/scpr.c
>>> index 37fbe7a106..2ef63a7bf8 100644
>>> --- a/libavcodec/scpr.c
>>> +++ b/libavcodec/scpr.c
>>> @@ -827,7 +827,16 @@ static int decode_frame(AVCodecContext *avctx, void 
>>> *data, int *got_frame,
>>>              return ret;
>>>  
>>>          for (y = 0; y < avctx->height; y++) {
>>> -            for (x = 0; x < avctx->width * 4; x++) {
>>> +            if (!(((uintptr_t)dst) & 7)) {
>>> +                uint64_t *dst64 = (uint64_t *)dst;
>>> +                int w = avctx->width>>1;
>>> +                for (x = 0; x < w; x++) {
>>> +                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
>>
>> Shouldn't this be used only if HAVE_FAST_64BIT is true, and a version
>> shifting four bytes at a time used otherwise? That's how we do almost
>> everywhere else.
> 
> The compiler would break the 64bit into two 32bit automatically.
> I can write an explicit version if that is wanted ?
> it seemed overkill for this here though


I guess it's simple enough that the compiler can figure that out, yeah.
Should be ok then.

> 
> 
>>
>> The chances for anyone bothering writing simd for this decoder are
>> almost none, so adding C optimized loops is ok in this case.
>>
>>> +                }
>>> +                x *= 8;
>>> +            } else
>>> +                x = 0;
>>
>> How does this fix the timeout if the new code is only run if the pointer
>> is eight byte aligned? (or four once you add that).
> 
> The timeout is IIRC currently defined as 30sec or so on the platform
> the fuzzer runs on, data is aligned in that case.
> 
> You could imagine a platform where data is not aligned, yes
> on that patform the patch would not improve the time decoding takes.
> Simiarly you could imagine a CPU that supports only 8bit operations,
> or just a slower CPU.
> 
> This patch adds some optimizations that makes decoding faster which
> gets us over the threshold for this particular sample and a normal
> modern desktop platform.
> 
> Its quite possible another scpr file will still hit the threshold
> and certainly another threshold or other hw could trigger it
> still with this sample
> 
> I would very much prefer a more universal fix ...
> But i didnt see one and making that loop 3 times as fast is great on
> its own.

Ok. As i said nobody is going to write simd for this, so adding C
optimized code is fine.
_______________________________________________
ffmpeg-devel mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/scpr: optimize shift loop.

Reply via email to