Re: [libav-devel] [PATCH 1/3] ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.

Loren Merritt Fri, 17 Jun 2011 18:45:18 -0700

On Thu, 16 Jun 2011, Justin Ruggles wrote:

On 06/12/2011 04:31 PM, Ronald S. Bultje wrote:

Hi,

On Sat, Jun 11, 2011 at 10:35 AM, Justin Ruggles
<[email protected]> wrote:

---
 libavcodec/dsputil.c            |   17 +++++++
 libavcodec/dsputil.h            |   14 ++++++
 libavcodec/x86/dsputil_mmx.c    |   15 +++++++
 libavcodec/x86/dsputil_yasm.asm |   88 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 134 insertions(+), 0 deletions(-)

[..]

+    CLIPD  m0, m4, m5, m6
+    CLIPD  m1, m4, m5, m6
+    CLIPD  m2, m4, m5, m6
+    CLIPD  m3, m4, m5, m6


For something like Atom (or basically anything with out-of-order
execution), this could be interleaved (i.e. CLIPDx2 m0, m1, m4, m5,
m6). With that changed, looks good to me, feel free to apply.



I tested that on Atom and it doesn't improve speed. But it doesn't hurt
speed either. Should we do it anyway?

Also, unrolling to 32 values per loop on x86-64 does help, so I'll send
an updated patch to do that.


On Atom, you mean? Penryn is indifferent to amount of unrolling here.
Can you unroll with %rep instead of copy/paste?

Also, document the limitations on min/max values due to the floatimplementation.


--Loren Merritt

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/3] ac3enc: add int32_t array clipping function to DSPUtil, including x86 versions.

Reply via email to