Hi, In fact the first patch is unneeded: dequantizing all 16 coefficients is quite less efficient than dequantizing only non-zero coeffs while decoding them (several libavcodec decoders do that). I'm questioning the whole point of rv34_dequant4x4 in fact. See end of mail for what I mean by that.
2011/12/31 Kostya Shishkov <[email protected]>: > It would be easier to review as a lesser number of patches though. Yes I only presented the whole to see where I was heading. But the more I look at the code, the more optimization opportunities I seee. For instance, dequant, idct and residue addition are done in separate passes over blocks (sometimes checking again the cbp), which is a bit inefficient. So I'll indeed go step by step. > It's better to verify analytically though. (as I said, let's see next patch) > Sorry, this one should be done differently - drop #ifdef here and add it to > x86/rv34dsp.c. The logic is that only code in libavcodec/x86 should care about > Yasm present or not (there's one exception but it's really ugly). Got it, will do so next time the opportunity appears again. > weird indentation and does it really make any speed difference? I don't think it makes a big difference indeed (I kind of see an improvement, but that could be within measure error), but the point was mostly to get conditions to perform dc-only processing of the blocks (which happens quite frequently and is thus a speed gain). This also raises the question of whether and how to handle the case of blocks with only DC and AC0-2 (I think several libavcodec decoders do that instead). But this is a question for later, when we'll get back to that DC-only handling. >> + // Does happen > > //So what? This would have been mostly useful for people optimizing (beside me): the comment was intended to mean "yes, this happens, so don't bother checking if it does, and thus how you can skip on some processing, relying on some such conditions. But OK, at this point, I should either write the exact reason or write nothing. Anyway, I have attached a patch showing a relatively new path: dequantize only non-zero coefficients. This requires some invasive changes but this really is efficient: - reference for 5 sequences: 4.27 / 4.98 / 2.76 / 1.20 / 3.20 - SSE2 dequant: 4.17 / 4.85 / 2.73 / 1.15 / 3.15 - new dequant: 4.12 / 4.74 / 2.72 / 1.17 / 3.15
rv34_change_dequant.diff
Description: Binary data
_______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
