Marc,

I took your suggestions into account, revised my earlier patch, and 
committed my changes to a new fork:
     https://github.com/serisman/FreeRDP

... more comments below ...

On 6/7/2011 9:29 PM, Marc-André Moreau wrote:
> Hi Steve,
>
> Well, that was fast :) I had started thinking of the different ways we 
> can integrate this SSE acceleration within the rest in a clean way. I 
> see two major options:
>
> 1) The SSE code is part of the library, and can be disabled with a 
> compile-time option. Methods for SSE and non-SSE have the same 
> signature. At initialization, detection of SSE support level is 
> performed, and a callback is registered to point to one of the methods 
> (SSE or non-SSE).
This is basically what my original patch was doing (minus having the 
compile-time option).

> 2) The SSE code is part of a loadable sub-module, just like a plugin. 
> detection is done in the "main" code and SSE support functions are 
> loaded and registered dynamically. The only issue here is that there 
> is not much value to doing it this way, since SSE support level can be 
> detected and used dynamically anyway, and SSE is only available on the 
> intel architecture. On systems where SSE is not available at all, such 
> as ARM, the compile-time option should discard it completely.
I agree with you that this seems like the wrong option.

> I think the part which needs to be abstracted is the "SIMD" system. 
> The decoder should perform a check for available SIMD (SSE on intel, 
> NEON on ARM) and use it if available, but when compiling for either 
> intel or arm, only the relevant implementation should actually be 
> compiled in. I think it would make sense to put the SSE code in 
> separate files which are only included if FreeRDP is compiled for the 
> architecture which makes sense for it. An analogy could be made with 
> the way we currently deal with multiple cryptographic libraries. We 
> have one file for abstracting each crypto library for our use, and at 
> compile time only one of then is compiled, just like SSE would be 
> compiled for intel only and NEON for arm only.
>
> What would you think of option 1), done in a similar way to the 
> current cryptographic abstraction layer code?
I have most of this done in my new fork (see above).  There is a 
configure option (--with-sse2) that is enabled by default that controls 
adding the sse2 .c/.h files and defining WITH_SSE2.  The _RFX_CONTEXT 
has a 'delegate' for the 1 method that has been optimized (so far).  The 
rfx_context_new method will set the delegate to the default non-sse 
software routine, and then call rfx_sse2_init if WITH_SSE2 is defined.  
the rfx_sse2_init method then will check whether the CPU actually 
supports sse2 (future) and changes the delegate(s) to the sse optimized 
version(s). Eventually we could probably have the --with-sse2 config 
option smart enough to disable itself automatically based on which CPU 
arch was being compiled.  On the code side I separated all the SSE code 
into files in a new 'sse2' directory under libfreerdp-rfx.  What was 
done with SSE could easily be duplicated for NEON for the ARM 
architecture as well.

Is this a good enough structure to be able to continue adding more 
optimizations?

Thanks,
  Steve

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Freerdp-devel mailing list
Freerdp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freerdp-devel

Reply via email to