On 9/8/2017 8:49 PM, Michael Niedermayer wrote: > On Fri, Sep 08, 2017 at 06:43:06PM -0300, James Almer wrote: >> On 9/8/2017 6:29 PM, Michael Niedermayer wrote: >>> Speeds code up from 50sec to 15sec >>> >>> Fixes Timeout >>> Fixes: 3242/clusterfuzz-testcase-5811951672229888 >>> >>> Found-by: continuous fuzzing process >>> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg >>> Signed-off-by: Michael Niedermayer <mich...@niedermayer.cc> >>> --- >>> libavcodec/scpr.c | 11 ++++++++++- >>> 1 file changed, 10 insertions(+), 1 deletion(-) >>> >>> diff --git a/libavcodec/scpr.c b/libavcodec/scpr.c >>> index 37fbe7a106..2ef63a7bf8 100644 >>> --- a/libavcodec/scpr.c >>> +++ b/libavcodec/scpr.c >>> @@ -827,7 +827,16 @@ static int decode_frame(AVCodecContext *avctx, void >>> *data, int *got_frame, >>> return ret; >>> >>> for (y = 0; y < avctx->height; y++) { >>> - for (x = 0; x < avctx->width * 4; x++) { >>> + if (!(((uintptr_t)dst) & 7)) { >>> + uint64_t *dst64 = (uint64_t *)dst; >>> + int w = avctx->width>>1; >>> + for (x = 0; x < w; x++) { >>> + dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL; >> >> Shouldn't this be used only if HAVE_FAST_64BIT is true, and a version >> shifting four bytes at a time used otherwise? That's how we do almost >> everywhere else. > > The compiler would break the 64bit into two 32bit automatically. > I can write an explicit version if that is wanted ? > it seemed overkill for this here though
I guess it's simple enough that the compiler can figure that out, yeah. Should be ok then. > > >> >> The chances for anyone bothering writing simd for this decoder are >> almost none, so adding C optimized loops is ok in this case. >> >>> + } >>> + x *= 8; >>> + } else >>> + x = 0; >> >> How does this fix the timeout if the new code is only run if the pointer >> is eight byte aligned? (or four once you add that). > > The timeout is IIRC currently defined as 30sec or so on the platform > the fuzzer runs on, data is aligned in that case. > > You could imagine a platform where data is not aligned, yes > on that patform the patch would not improve the time decoding takes. > Simiarly you could imagine a CPU that supports only 8bit operations, > or just a slower CPU. > > This patch adds some optimizations that makes decoding faster which > gets us over the threshold for this particular sample and a normal > modern desktop platform. > > Its quite possible another scpr file will still hit the threshold > and certainly another threshold or other hw could trigger it > still with this sample > > I would very much prefer a more universal fix ... > But i didnt see one and making that loop 3 times as fast is great on > its own. Ok. As i said nobody is going to write simd for this, so adding C optimized code is fine. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel