Vic, On 6/10/2011 9:36 AM, Vic Lee wrote: > That's quite strange because it processes 8 coeffectients in parallel > and shouldn't be slower. > I agree. At this point I have no idea how it can still be slower, but it is. Granted this is my first time writing SSE code, and for all I know, I am doing something horribly wrong.
I did run across some information that seemed to indicate that the order of instructions can have a big impact on performance. Apparently it can cause the CPU to get in a cache-miss state that causes the scheduler to have to wait for memory retrieval before continuing. Maybe this is what we are running into? If it is, however, I'm not sure how to optimize it further. These SSE instructions are definitely proving to be more finicky than advertised. Thanks, Steve ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Freerdp-devel mailing list Freerdp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freerdp-devel