I'm super late to the party, but here's a nicely formatted here's a
comparison between the x86 and the SIMD compilation

http://pastebin.com/qLtKGW3E


The SIMD machine code is extremely clean


It's possible that it would be more pipeline-friendly if you move down
                cb = *cb_g_buf;

to right before the g = assignments, but that shouldn't make a giant
difference...


Might also be worth it to put in those constants at the top (y_add,
zero, max) as constants in the code, as currently they're getting
loaded into registers.

Sorry I'm not able to test right now, not sure how to run


On Fri, Jun 10, 2011 at 9:36 AM, Vic Lee <ll...@163.com> wrote:
> That's quite strange because it processes 8 coeffectients in parallel
> and shouldn't be slower.
>
> On 06/10/2011 05:16 PM, Martin Fleisz wrote:
>> The_mm_* functions are compiler intrinsics and map 1:1 to the
>> corresponding SSE instructions. It's just a nicer and cleaner interface
>> to the instruction set (and there is no function call overhead).
>
>
>
> ------------------------------------------------------------------------------
> EditLive Enterprise is the world's most technically advanced content
> authoring tool. Experience the power of Track Changes, Inline Image
> Editing and ensure content is compliant with Accessibility Checking.
> http://p.sf.net/sfu/ephox-dev2dev
> _______________________________________________
> Freerdp-devel mailing list
> Freerdp-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/freerdp-devel
>

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Freerdp-devel mailing list
Freerdp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freerdp-devel

Reply via email to