I'm super late to the party, but here's a nicely formatted here's a comparison between the x86 and the SIMD compilation
http://pastebin.com/qLtKGW3E The SIMD machine code is extremely clean It's possible that it would be more pipeline-friendly if you move down cb = *cb_g_buf; to right before the g = assignments, but that shouldn't make a giant difference... Might also be worth it to put in those constants at the top (y_add, zero, max) as constants in the code, as currently they're getting loaded into registers. Sorry I'm not able to test right now, not sure how to run On Fri, Jun 10, 2011 at 9:36 AM, Vic Lee <ll...@163.com> wrote: > That's quite strange because it processes 8 coeffectients in parallel > and shouldn't be slower. > > On 06/10/2011 05:16 PM, Martin Fleisz wrote: >> The_mm_* functions are compiler intrinsics and map 1:1 to the >> corresponding SSE instructions. It's just a nicer and cleaner interface >> to the instruction set (and there is no function call overhead). > > > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > _______________________________________________ > Freerdp-devel mailing list > Freerdp-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/freerdp-devel > ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Freerdp-devel mailing list Freerdp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freerdp-devel