On Thu, 16 Jun 2011, Justin Ruggles wrote:
On 06/12/2011 04:31 PM, Ronald S. Bultje wrote:
Hi,
On Sat, Jun 11, 2011 at 10:35 AM, Justin Ruggles
<[email protected]> wrote:
---
libavcodec/dsputil.c | 17 +++++++
libavcodec/dsputil.h | 14 ++++++
libavcodec/x86/dsputil_mmx.c | 15 +++++++
libavcodec/x86/dsputil_yasm.asm | 88 +++++++++++++++++++++++++++++++++++++++
4 files changed, 134 insertions(+), 0 deletions(-)
[..]
+ CLIPD m0, m4, m5, m6
+ CLIPD m1, m4, m5, m6
+ CLIPD m2, m4, m5, m6
+ CLIPD m3, m4, m5, m6
For something like Atom (or basically anything with out-of-order
execution), this could be interleaved (i.e. CLIPDx2 m0, m1, m4, m5,
m6). With that changed, looks good to me, feel free to apply.
I tested that on Atom and it doesn't improve speed. But it doesn't hurt
speed either. Should we do it anyway?
Also, unrolling to 32 values per loop on x86-64 does help, so I'll send
an updated patch to do that.
On Atom, you mean? Penryn is indifferent to amount of unrolling here.
Can you unroll with %rep instead of copy/paste?
Also, document the limitations on min/max values due to the float
implementation.
--Loren Merritt
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel