On Fri, Sep 08, 2017 at 06:43:06PM -0300, James Almer wrote: > On 9/8/2017 6:29 PM, Michael Niedermayer wrote: > > Speeds code up from 50sec to 15sec > > > > Fixes Timeout > > Fixes: 3242/clusterfuzz-testcase-5811951672229888 > > > > Found-by: continuous fuzzing process > > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > > Signed-off-by: Michael Niedermayer <mich...@niedermayer.cc> > > --- > > libavcodec/scpr.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/libavcodec/scpr.c b/libavcodec/scpr.c > > index 37fbe7a106..2ef63a7bf8 100644 > > --- a/libavcodec/scpr.c > > +++ b/libavcodec/scpr.c > > @@ -827,7 +827,16 @@ static int decode_frame(AVCodecContext *avctx, void > > *data, int *got_frame, > > return ret; > > > > for (y = 0; y < avctx->height; y++) { > > - for (x = 0; x < avctx->width * 4; x++) { > > + if (!(((uintptr_t)dst) & 7)) { > > + uint64_t *dst64 = (uint64_t *)dst; > > + int w = avctx->width>>1; > > + for (x = 0; x < w; x++) { > > + dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL; > > Shouldn't this be used only if HAVE_FAST_64BIT is true, and a version > shifting four bytes at a time used otherwise? That's how we do almost > everywhere else.
The compiler would break the 64bit into two 32bit automatically. I can write an explicit version if that is wanted ? it seemed overkill for this here though > > The chances for anyone bothering writing simd for this decoder are > almost none, so adding C optimized loops is ok in this case. > > > + } > > + x *= 8; > > + } else > > + x = 0; > > How does this fix the timeout if the new code is only run if the pointer > is eight byte aligned? (or four once you add that). The timeout is IIRC currently defined as 30sec or so on the platform the fuzzer runs on, data is aligned in that case. You could imagine a platform where data is not aligned, yes on that patform the patch would not improve the time decoding takes. Simiarly you could imagine a CPU that supports only 8bit operations, or just a slower CPU. This patch adds some optimizations that makes decoding faster which gets us over the threshold for this particular sample and a normal modern desktop platform. Its quite possible another scpr file will still hit the threshold and certainly another threshold or other hw could trigger it still with this sample I would very much prefer a more universal fix ... But i didnt see one and making that loop 3 times as fast is great on its own. [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Asymptotically faster algorithms should always be preferred if you have asymptotical amounts of data
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel