On Sun, Jan 01, 2012 at 04:22:54PM +0100, Christophe Gisquet wrote:
> Hi,
> 
> In fact the first patch is unneeded: dequantizing all 16 coefficients
> is quite less efficient than dequantizing only non-zero coeffs while
> decoding them (several libavcodec decoders do that). I'm questioning
> the whole point of rv34_dequant4x4 in fact. See end of mail for what I
> mean by that.
> 
> 2011/12/31 Kostya Shishkov <[email protected]>:
> > It would be easier to review as a lesser number of patches though.
> 
> Yes I only presented the whole to see where I was heading. But the
> more I look at the code, the more optimization opportunities I seee.
> For instance, dequant, idct and residue addition are done in separate
> passes over blocks (sometimes checking again the cbp), which is a bit
> inefficient.
> 
> So I'll indeed go step by step.
> 
> > It's better to verify analytically though.
> 
> (as I said, let's see next patch)
> 
> > Sorry, this one should be done differently - drop #ifdef here and add it to
> > x86/rv34dsp.c. The logic is that only code in libavcodec/x86 should care 
> > about
> > Yasm present or not (there's one exception but it's really ugly).
> 
> Got it, will do so next time the opportunity appears again.
> 
> > weird indentation and does it really make any speed difference?
> 
> I don't think it makes a big difference indeed (I kind of see an
> improvement, but that could be within measure error), but the point
> was mostly to get conditions to perform dc-only processing of the
> blocks (which happens quite frequently and is thus a speed gain).
> 
> This also raises the question of whether and how to handle the case of
> blocks with only DC and AC0-2 (I think several libavcodec decoders do
> that instead). But this is a question for later, when we'll get back
> to that DC-only handling.
> 
> >> +      // Does happen
> >
> > //So what?
> 
> This would have been mostly useful for people optimizing (beside me):
> the comment was intended to mean "yes, this happens, so don't bother
> checking if it does, and thus how you can skip on some processing,
> relying on some such conditions. But OK, at this point, I should
> either write the exact reason or write nothing.
 
Write nothing then - it's clearer.

> Anyway, I have attached a patch showing a relatively new path:
> dequantize only non-zero coefficients. This requires some invasive
> changes but this really is efficient:
> - reference for 5 sequences: 4.27 / 4.98 / 2.76 / 1.20 / 3.20
> - SSE2 dequant: 4.17 / 4.85 / 2.73 / 1.15 / 3.15
> - new dequant: 4.12 / 4.74 / 2.72 / 1.17 / 3.15

I'll wait for NEON results. The patch itself looks decent.
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to