scpr: optimize shift loop.

Michael Niedermayer Fri, 08 Sep 2017 16:52:42 -0700

On Fri, Sep 08, 2017 at 06:43:06PM -0300, James Almer wrote:
> On 9/8/2017 6:29 PM, Michael Niedermayer wrote:
> > Speeds code up from 50sec to 15sec
> > 
> > Fixes Timeout
> > Fixes: 3242/clusterfuzz-testcase-5811951672229888
> > 
> > Found-by: continuous fuzzing process 
> > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> > Signed-off-by: Michael Niedermayer <mich...@niedermayer.cc>
> > ---
> >  libavcodec/scpr.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/libavcodec/scpr.c b/libavcodec/scpr.c
> > index 37fbe7a106..2ef63a7bf8 100644
> > --- a/libavcodec/scpr.c
> > +++ b/libavcodec/scpr.c
> > @@ -827,7 +827,16 @@ static int decode_frame(AVCodecContext *avctx, void 
> > *data, int *got_frame,
> >              return ret;
> >  
> >          for (y = 0; y < avctx->height; y++) {
> > -            for (x = 0; x < avctx->width * 4; x++) {
> > +            if (!(((uintptr_t)dst) & 7)) {
> > +                uint64_t *dst64 = (uint64_t *)dst;
> > +                int w = avctx->width>>1;
> > +                for (x = 0; x < w; x++) {
> > +                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
> 
> Shouldn't this be used only if HAVE_FAST_64BIT is true, and a version
> shifting four bytes at a time used otherwise? That's how we do almost
> everywhere else.


The compiler would break the 64bit into two 32bit automatically.
I can write an explicit version if that is wanted ?
it seemed overkill for this here though


> 
> The chances for anyone bothering writing simd for this decoder are
> almost none, so adding C optimized loops is ok in this case.
> 
> > +                }
> > +                x *= 8;
> > +            } else
> > +                x = 0;
> 
> How does this fix the timeout if the new code is only run if the pointer
> is eight byte aligned? (or four once you add that).

The timeout is IIRC currently defined as 30sec or so on the platform
the fuzzer runs on, data is aligned in that case.

You could imagine a platform where data is not aligned, yes
on that patform the patch would not improve the time decoding takes.
Simiarly you could imagine a CPU that supports only 8bit operations,
or just a slower CPU.

This patch adds some optimizations that makes decoding faster which
gets us over the threshold for this particular sample and a normal
modern desktop platform.

Its quite possible another scpr file will still hit the threshold
and certainly another threshold or other hw could trigger it
still with this sample

I would very much prefer a more universal fix ...
But i didnt see one and making that loop 3 times as fast is great on
its own.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data

signature.asc
Description: Digital signature

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/scpr: optimize shift loop.

Reply via email to