SUCCESS! I figured out the major problem with my SSE code. Apparently you have to pay special attention to how the CPU access memory and provide the proper cache hints so it doesn't starve itself and make you wait for slow RAM accesses.
This simple commit has made a huge difference on RemoteFX performance (more optimizations to come): https://github.com/FreeRDP/FreeRDP/commit/220008fad7dc7eabc81e2e03b81604145d369ad4 before change: | rfx_decode_YCbCr_to_RGB_SSE2 | 15671 | 4.740000 | 0.000302 | after change: | rfx_decode_YCbCr_to_RGB_SSE2 | 24945 | 0.460000 | 0.000018 | That is a > 18x improvement over the previous version, and the difference is visually noticeable. The SSE optimized method is now about 5-6x faster than the non-SSE method. Both of these results came from my Intel Atom D510 board. Thanks, Steve On 6/10/2011 1:09 AM, S. Erisman wrote: > Hey Vic, > > On 6/10/2011 12:32 AM, Vic Lee wrote: >> Hi Steve, >> >> Yes both is faster, but the SSE version is still quite slower than the >> original one. Here is my testing. >> >> Before pulling: >> | rfx_decode_YCbCr_to_RGB_SSE2 | 2123 | 1.750000 | 0.000824 | >> | rfx_decode_YCbCr_to_RGB | 2098 | 0.260000 | 0.000124 | >> >> After pulling your commits: >> | rfx_decode_YCbCr_to_RGB_SSE2 | 2049 | 0.690000 | 0.000337 | >> | rfx_decode_YCbCr_to_RGB | 2111 | 0.240000 | 0.000114 | >> >> Oh by the way, the profiler is cool. :) >> >> Vic ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Freerdp-devel mailing list Freerdp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freerdp-devel