Hi, first of all, a disclaimer: git is not my cup of tea. While this is presented as a series, I'll get back to working serially on each patch afterwards. In addition, I'd like to focus on algorithm changes first and once this has stabilized, consider the rest.
Here is a series of patches related to dequantization in RV 30/40: 0) MMX2 and SSE2 versions of the 4x4 dequant function Given the DCT and dequant formulas, I assumed the dequant was ok with 16bits intermediate (the neon code does not). This patch also adds the build system C is around 77 cycles, MMX is 31 and SSE2 is around 30 (using START/STOP_TIMER). If no further optimization are possible with the later, I don't see its point. This is around a 4% speed improvement here. 1) Move QP look-up out of loop This is in fact mostly cosmetical as I guess the compiler already does that. 2) Check for DC-only blocks by modifying how the first 2x2 subblock is handled This is a preliminary work for DC-only optimizations. It does not feel natural so I may have missed other opportunities. 3) DC-only dequantization+iDCT Classical optimization already present in H.264 and VC1 decoders at least The dequantization (of only the DC coeff) actually happens outside of the DSP function to help factorizing code 4) MMX2 optimization for the former This is hardly worth it: this goes down from around 29 cycles to 25. The whole provides a 7% speed improvement with unoptimized iDCT. Best regards, Christophe
0-rv34_dequant.diff
Description: Binary data
1-rv34_q_lookup.diff
Description: Binary data
2-skip_decode.diff
Description: Binary data
3-rv34_dequant_idct_dc.diff
Description: Binary data
4-rv34_mmx2_idct_dequant.diff
Description: Binary data
_______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
