On Sun, Jan 01, 2012 at 04:22:54PM +0100, Christophe Gisquet wrote: > Hi, > > In fact the first patch is unneeded: dequantizing all 16 coefficients > is quite less efficient than dequantizing only non-zero coeffs while > decoding them (several libavcodec decoders do that). I'm questioning > the whole point of rv34_dequant4x4 in fact. See end of mail for what I > mean by that. > > 2011/12/31 Kostya Shishkov <[email protected]>: > > It would be easier to review as a lesser number of patches though. > > Yes I only presented the whole to see where I was heading. But the > more I look at the code, the more optimization opportunities I seee. > For instance, dequant, idct and residue addition are done in separate > passes over blocks (sometimes checking again the cbp), which is a bit > inefficient. > > So I'll indeed go step by step. > > > It's better to verify analytically though. > > (as I said, let's see next patch) > > > Sorry, this one should be done differently - drop #ifdef here and add it to > > x86/rv34dsp.c. The logic is that only code in libavcodec/x86 should care > > about > > Yasm present or not (there's one exception but it's really ugly). > > Got it, will do so next time the opportunity appears again. > > > weird indentation and does it really make any speed difference? > > I don't think it makes a big difference indeed (I kind of see an > improvement, but that could be within measure error), but the point > was mostly to get conditions to perform dc-only processing of the > blocks (which happens quite frequently and is thus a speed gain). > > This also raises the question of whether and how to handle the case of > blocks with only DC and AC0-2 (I think several libavcodec decoders do > that instead). But this is a question for later, when we'll get back > to that DC-only handling. > > >> + // Does happen > > > > //So what? > > This would have been mostly useful for people optimizing (beside me): > the comment was intended to mean "yes, this happens, so don't bother > checking if it does, and thus how you can skip on some processing, > relying on some such conditions. But OK, at this point, I should > either write the exact reason or write nothing. Write nothing then - it's clearer.
> Anyway, I have attached a patch showing a relatively new path: > dequantize only non-zero coefficients. This requires some invasive > changes but this really is efficient: > - reference for 5 sequences: 4.27 / 4.98 / 2.76 / 1.20 / 3.20 > - SSE2 dequant: 4.17 / 4.85 / 2.73 / 1.15 / 3.15 > - new dequant: 4.12 / 4.74 / 2.72 / 1.17 / 3.15 I'll wait for NEON results. The patch itself looks decent. _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
