Vic,

On 6/10/2011 9:36 AM, Vic Lee wrote:
> That's quite strange because it processes 8 coeffectients in parallel
> and shouldn't be slower.
>
I agree.  At this point I have no idea how it can still be slower, but 
it is.  Granted this is my first time writing SSE code, and for all I 
know, I am doing something horribly wrong.

I did run across some information that seemed to indicate that the order 
of instructions can have a big impact on performance.  Apparently it can 
cause the CPU to get in a cache-miss state that causes the scheduler to 
have to wait for memory retrieval before continuing.  Maybe this is what 
we are running into?  If it is, however, I'm not sure how to optimize it 
further.

These SSE instructions are definitely proving to be more finicky than 
advertised.

Thanks,
  Steve

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Freerdp-devel mailing list
Freerdp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freerdp-devel

Reply via email to