Re: [libav-devel] [PATCH] Add h264_idct_10.asm

Loren Merritt Tue, 24 May 2011 22:08:16 -0700

On Tue, 24 May 2011, Daniel Kang wrote:

On Tue, May 24, 2011 at 10:43 AM, Loren Merritt <[email protected]>wrote:

Are you sure you don't want to deinline the idct part and unroll the loop
over blocks? If not, what's different about h264_idct_add16_sse2?


Different as compared to what?

Difference between h264_idct_add16_sse2 h264_idct_add16_10_sse2 that wouldcause different strategies to be optimal.

I can unroll it if you prefer.

I'm not stating a preference, I'm describing a strategy that mightor might not be faster. Strategy found by pattern-matching on existingcode, not by abstract reasoning; I expect a large speed gain only becausea comment on the existing code says so.

In a previous patch you had deinlined IDCT8. Did you decide that it's ok to
spend 2kb on this function? Or 4kb since h264_idct8_add4_10 doesn't call
h264_idct8_add_10?


That patch was for x264. Here, arguments change. I guess I could make
another function if you really prefer... It would require pushing args to
the stack or xchg's.

Likewise, the preference is for whichever is faster, combined with aprediction that icache is important.


--Loren Merritt
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] Add h264_idct_10.asm

Reply via email to