Hi Steve,
Well, that was fast :) I had started thinking of the different ways we can
integrate this SSE acceleration within the rest in a clean way. I see two
major options:
1) The SSE code is part of the library, and can be disabled with a
compile-time option. Methods for SSE and non-SSE have the same signature. At
initialization, detection of SSE support level is performed, and a callback
is registered to point to one of the methods (SSE or non-SSE).
2) The SSE code is part of a loadable sub-module, just like a plugin.
detection is done in the "main" code and SSE support functions are loaded
and registered dynamically. The only issue here is that there is not much
value to doing it this way, since SSE support level can be detected and used
dynamically anyway, and SSE is only available on the intel architecture. On
systems where SSE is not available at all, such as ARM, the compile-time
option should discard it completely.
I think the part which needs to be abstracted is the "SIMD" system. The
decoder should perform a check for available SIMD (SSE on intel, NEON on
ARM) and use it if available, but when compiling for either intel or arm,
only the relevant implementation should actually be compiled in. I think it
would make sense to put the SSE code in separate files which are only
included if FreeRDP is compiled for the architecture which makes sense for
it. An analogy could be made with the way we currently deal with multiple
cryptographic libraries. We have one file for abstracting each crypto
library for our use, and at compile time only one of then is compiled, just
like SSE would be compiled for intel only and NEON for arm only.
What would you think of option 1), done in a similar way to the current
cryptographic abstraction layer code?
On Tue, Jun 7, 2011 at 9:46 PM, S. Erisman <seris...@serisman.com> wrote:
> Marc,
>
> On 6/6/2011 9:20 AM, Marc-André Moreau wrote:
>
>> I read more about SSE, and then about NEON which is the equivalent for ARM
>>
>> My first impression is damn, how could I not see this before? This thing
>> looks very well suited not only for acceleration of RemoteFX decoding, but
>> there's a chance that more GDI operations could be accelerated with it than
>> the current implementation in xfreerdp. Color conversion also appears to be
>> possible with it. If someone wants to work on something like this, let me
>> know.
>>
>
> I started working on adding SSE/SSE2 decoding support to the RemoteFX
> library.
>
> I think there are several questions that still need to be answered on how
> to best wire this up, but please review the attached .patch file to see what
> I have working so far. This .patch file is based off of your recent changes
> in the awakecoding/FreeRDP branch.
>
> As a starting place, I broke out the YCbCr to RGB conversion code out of
> rfx_decode_rgb and into a separate function. I then added an SSE
> 'optimized' version of it. Also included is a file with the disassembly of
> the rfx_decode.o file that clearly shows the difference between the 2
> functions.
>
> One note... I had to use a ./configure CFLAGS="-O2 -msse2" command to get
> this code to compile (the -O2 isn't actually needed, but cleans up the
> assembled code). I think we would need to find a better way of
> automatically handling this. Maybe a --with-sse flag that can be passed to
> ./configure with #ifdef lines around SSE code? Help around how to set this
> up would be appreciated.
>
> Then there are questions about structure. Should we break out SSE
> optimizations into their own files and/or libraries, or leave them alongside
> their non-SSE cousins?
>
> Lastly, is there a good way to test if and how much better these
> optimizations actually are? I started messing around with gprof, sprof, and
> oprofile, but I can't seem to get debug info out of the libfreerdp-rfx
> static library. gprof works, but only records info on the xfreerdp
> application and not on static libraries. I can't seem to get sprof or
> oprofile working either. Maybe it is just the way I was using them, but is
> there a better/easier way to profile this library? Or... maybe we could set
> up a unit test with known RFX data that can be run through a number of
> iterations and then time it?
>
> Any other thoughts?
>
> -Steve
>
>
>
>
>
> ------------------------------------------------------------------------------
> EditLive Enterprise is the world's most technically advanced content
> authoring tool. Experience the power of Track Changes, Inline Image
> Editing and ensure content is compliant with Accessibility Checking.
> http://p.sf.net/sfu/ephox-dev2dev
> _______________________________________________
> Freerdp-devel mailing list
> Freerdp-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/freerdp-devel
>
>
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Freerdp-devel mailing list
Freerdp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freerdp-devel