On 2012-01-01 16:22:54 +0100, Christophe Gisquet wrote:
> 
> In fact the first patch is unneeded: dequantizing all 16 coefficients
> is quite less efficient than dequantizing only non-zero coeffs while
> decoding them (several libavcodec decoders do that). I'm questioning
> the whole point of rv34_dequant4x4 in fact. See end of mail for what I
> mean by that.

[...]
 
> This would have been mostly useful for people optimizing (beside me):
> the comment was intended to mean "yes, this happens, so don't bother
> checking if it does, and thus how you can skip on some processing,
> relying on some such conditions. But OK, at this point, I should
> either write the exact reason or write nothing.
> 
> Anyway, I have attached a patch showing a relatively new path:
> dequantize only non-zero coefficients. This requires some invasive
> changes but this really is efficient:
> - reference for 5 sequences: 4.27 / 4.98 / 2.76 / 1.20 / 3.20
> - SSE2 dequant: 4.17 / 4.85 / 2.73 / 1.15 / 3.15
> - new dequant: 4.12 / 4.74 / 2.72 / 1.17 / 3.15

seems to be ~1% faster on my memory limited omap4/panda

Janne
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to