Hi,

first of all, a disclaimer: git is not my cup of tea. While this is
presented as a series, I'll get back to working serially on each patch
afterwards.
In addition, I'd like to focus on algorithm changes first and once
this has stabilized, consider the rest.

Here is a series of patches related to dequantization in RV 30/40:
0) MMX2 and SSE2 versions of the 4x4 dequant function
Given the DCT and dequant formulas, I assumed the dequant was ok with
16bits intermediate (the neon code does not).
This patch also adds the build system
C is around 77 cycles, MMX is 31 and SSE2 is around 30 (using
START/STOP_TIMER). If no further optimization are possible with the
later, I don't see its point.
This is around a 4% speed improvement here.
1) Move QP look-up out of loop
This is in fact mostly cosmetical as I guess the compiler already does that.
2) Check for DC-only blocks by modifying how the first 2x2 subblock is handled
This is a preliminary work for DC-only optimizations. It does not feel
natural so I may have missed other opportunities.
3) DC-only dequantization+iDCT
Classical optimization already present in H.264 and VC1 decoders at least
The dequantization (of only the DC coeff) actually happens outside of
the DSP function to help factorizing code
4) MMX2 optimization for the former
This is hardly worth it: this goes down from around 29 cycles to 25.

The whole provides a 7% speed improvement with unoptimized iDCT.

Best regards,
Christophe

Attachment: 0-rv34_dequant.diff
Description: Binary data

Attachment: 1-rv34_q_lookup.diff
Description: Binary data

Attachment: 2-skip_decode.diff
Description: Binary data

Attachment: 3-rv34_dequant_idct_dc.diff
Description: Binary data

Attachment: 4-rv34_mmx2_idct_dequant.diff
Description: Binary data

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to