On Thu, Nov 19, 2015 at 04:50:54PM +0100, Michael Niedermayer wrote: > On Thu, Nov 19, 2015 at 11:48:53AM +0100, Clément Bœsch wrote: > > From: Matthieu Bouron <matthieu.bou...@stupeflix.com> > > > > Signed-off-by: Matthieu Bouron <matthieu.bou...@stupeflix.com> > > Signed-off-by: Clément Bœsch <clem...@stupeflix.com> > > > > --- > > The function takes about 29ms with a 1080p source (testsrc2) on a > > cortex-a8. Though, 16ms (more than half the time!) is spend in the vst2 > > call. Any suggestion on how to speed up this? > > > > Also, the reference code seems to cause some kind of ringing, while our > > ASM doesn't: > > http://b.pkh.me/nv12-rgba-ref.png > > http://b.pkh.me/nv12-rgba-neon.png > > what did you test exactly here ?
./ffmpeg -f lavfi -i testsrc2 -vf format=nv12,format=rgba -ss 1 -frames:v 1 -y nv12-rgba-ref.png (on ARM though, and with -cpuflags 0) > but there are several codepathes for rgb output, one uses LUTs and > not all use full resolution chroma > Yeah, we noticed... Note: on x86 there are some yuv2rgb mmx code but it's not called above because it doesn't handle nv12 (only yuv420 & friends), so the chroma issue is reproducible (it's calling the LUT path). > > > > > Last, we noticed that the y_offset is scaled to 1<<9 for some reason we > > couldn't figure out. Hopefully we're doing it correctly here. > > [...] > > +.macro compute_half_line dst half_y ofmt > > + vmovl.u8 q7, \half_y @ > > 8px of Y > > + vdup.16 q5, r9 > > + vsub.s16 q7, q5 > > + vmull.s16 q1, d14, d0 @ > > q1 = (srcY - y_offset) * y_coeff (left) > > + vmull.s16 q2, d15, d0 @ > > q2 = (srcY - y_offset) * y_coeff (right) > > if you do something like (srcY) * y_coeff - y_offset2 > then you could keep a bit more precission in the requested brightness > correction The code in swscale/output.c seems to always use the form we use here. Is it on purpose? > OTOH maybe you want to be bitexact to some existing codepath > Right... I suppose we don't have much tests with custom brightness/contrast/saturation. Should I add expose them in vf_scale and see how much breaks? :) > either way, your patch passes fate with arm qemu here so i have > no objections if you also tested it and it works > but maybe others have more comments about the asm ... > -- Clément B.
signature.asc
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel