SUCCESS!

I figured out the major problem with my SSE code.  Apparently you have 
to pay special attention to how the CPU access memory and provide the 
proper cache hints so it doesn't starve itself and make you wait for 
slow RAM accesses.

This simple commit has made a huge difference on RemoteFX performance 
(more optimizations to come):
     
https://github.com/FreeRDP/FreeRDP/commit/220008fad7dc7eabc81e2e03b81604145d369ad4

before change:

        | rfx_decode_YCbCr_to_RGB_SSE2   |      15671 |  4.740000 |  0.000302 |

after change:

        | rfx_decode_YCbCr_to_RGB_SSE2   |      24945 |  0.460000 |  0.000018 |


That is a > 18x improvement over the previous version, and the 
difference is visually noticeable.
The SSE optimized method is now about 5-6x faster than the non-SSE method.

Both of these results came from my Intel Atom D510 board.

Thanks,
Steve


On 6/10/2011 1:09 AM, S. Erisman wrote:
> Hey Vic,
>
> On 6/10/2011 12:32 AM, Vic Lee wrote:
>> Hi Steve,
>>
>> Yes both is faster, but the SSE version is still quite slower than the
>> original one. Here is my testing.
>>
>> Before pulling:
>> | rfx_decode_YCbCr_to_RGB_SSE2  |       2123 |  1.750000 |  0.000824 |
>> | rfx_decode_YCbCr_to_RGB       |       2098 |  0.260000 |  0.000124 |
>>
>> After pulling your commits:
>> | rfx_decode_YCbCr_to_RGB_SSE2  |       2049 |  0.690000 |  0.000337 |
>> | rfx_decode_YCbCr_to_RGB       |       2111 |  0.240000 |  0.000114 |
>>
>> Oh by the way, the profiler is cool. :)
>>
>> Vic


------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Freerdp-devel mailing list
Freerdp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freerdp-devel

Reply via email to