On 2012-01-01 16:22:54 +0100, Christophe Gisquet wrote: > > In fact the first patch is unneeded: dequantizing all 16 coefficients > is quite less efficient than dequantizing only non-zero coeffs while > decoding them (several libavcodec decoders do that). I'm questioning > the whole point of rv34_dequant4x4 in fact. See end of mail for what I > mean by that.
[...] > This would have been mostly useful for people optimizing (beside me): > the comment was intended to mean "yes, this happens, so don't bother > checking if it does, and thus how you can skip on some processing, > relying on some such conditions. But OK, at this point, I should > either write the exact reason or write nothing. > > Anyway, I have attached a patch showing a relatively new path: > dequantize only non-zero coefficients. This requires some invasive > changes but this really is efficient: > - reference for 5 sequences: 4.27 / 4.98 / 2.76 / 1.20 / 3.20 > - SSE2 dequant: 4.17 / 4.85 / 2.73 / 1.15 / 3.15 > - new dequant: 4.12 / 4.74 / 2.72 / 1.17 / 3.15 seems to be ~1% faster on my memory limited omap4/panda Janne _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
