Hi,

In fact the first patch is unneeded: dequantizing all 16 coefficients
is quite less efficient than dequantizing only non-zero coeffs while
decoding them (several libavcodec decoders do that). I'm questioning
the whole point of rv34_dequant4x4 in fact. See end of mail for what I
mean by that.

2011/12/31 Kostya Shishkov <[email protected]>:
> It would be easier to review as a lesser number of patches though.

Yes I only presented the whole to see where I was heading. But the
more I look at the code, the more optimization opportunities I seee.
For instance, dequant, idct and residue addition are done in separate
passes over blocks (sometimes checking again the cbp), which is a bit
inefficient.

So I'll indeed go step by step.

> It's better to verify analytically though.

(as I said, let's see next patch)

> Sorry, this one should be done differently - drop #ifdef here and add it to
> x86/rv34dsp.c. The logic is that only code in libavcodec/x86 should care about
> Yasm present or not (there's one exception but it's really ugly).

Got it, will do so next time the opportunity appears again.

> weird indentation and does it really make any speed difference?

I don't think it makes a big difference indeed (I kind of see an
improvement, but that could be within measure error), but the point
was mostly to get conditions to perform dc-only processing of the
blocks (which happens quite frequently and is thus a speed gain).

This also raises the question of whether and how to handle the case of
blocks with only DC and AC0-2 (I think several libavcodec decoders do
that instead). But this is a question for later, when we'll get back
to that DC-only handling.

>> +      // Does happen
>
> //So what?

This would have been mostly useful for people optimizing (beside me):
the comment was intended to mean "yes, this happens, so don't bother
checking if it does, and thus how you can skip on some processing,
relying on some such conditions. But OK, at this point, I should
either write the exact reason or write nothing.

Anyway, I have attached a patch showing a relatively new path:
dequantize only non-zero coefficients. This requires some invasive
changes but this really is efficient:
- reference for 5 sequences: 4.27 / 4.98 / 2.76 / 1.20 / 3.20
- SSE2 dequant: 4.17 / 4.85 / 2.73 / 1.15 / 3.15
- new dequant: 4.12 / 4.74 / 2.72 / 1.17 / 3.15

Attachment: rv34_change_dequant.diff
Description: Binary data

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to