sse4 optimizations.

Ronald S. Bultje Wed, 05 Oct 2011 04:23:06 -0700

Hi,

On Tue, Oct 4, 2011 at 6:35 AM, Ronald S. Bultje <[email protected]> wrote:
> Attached.


Summary of IRC review by Jason:

Dark_Shikari> pmaxsw: I have no idea why you have a third argument here
Dark_Shikari> it probably didn't pass fate

(Oops) done.

Dark_Shikari> FYI, when you override instructions, please use
capitalized macro names
Dark_Shikari> e.g. SIGNEXTEND

done.

Dark_Shikari> movdqa -> mova everywhere

done.

Dark_Shikari> pslld: use paddd instead
Dark_Shikari> in the case where you have pslld x,1
Dark_Shikari> faster on many cpus

done.

Dark_Shikari> 285-286: SBUTTERFLY
Dark_Shikari> er, -288
Dark_Shikari> same with 227-231
Dark_Shikari> and 306-309 etc

done.

Dark_Shikari> 239-244: avx please
Dark_Shikari> 294-295 avx... you get my point.
Dark_Shikari> 3-operand is your friend
Dark_Shikari> 509-510: avx

done. Avx is now 197 cycles (sse4 is 221, sse2 is 222) on ifb's sandybridge.

Dark_Shikari> Is there any easy way to swap around your register names
to maximize the number of instructions which don't contain m>=8?
Dark_Shikari> (within reason)
Dark_Shikari> one byte less per instruction

So I haven't done this yet, but wanted to post the current patch
already for further review. I'll see what I can do about the above.

Ronald

0001-prores-idct-sse2-sse4-optimizations.patch
Description: Binary data

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 2/3] prores: idct sse2/sse4 optimizations.

Reply via email to