Marc, I took your suggestions into account, revised my earlier patch, and committed my changes to a new fork: https://github.com/serisman/FreeRDP
... more comments below ... On 6/7/2011 9:29 PM, Marc-André Moreau wrote: > Hi Steve, > > Well, that was fast :) I had started thinking of the different ways we > can integrate this SSE acceleration within the rest in a clean way. I > see two major options: > > 1) The SSE code is part of the library, and can be disabled with a > compile-time option. Methods for SSE and non-SSE have the same > signature. At initialization, detection of SSE support level is > performed, and a callback is registered to point to one of the methods > (SSE or non-SSE). This is basically what my original patch was doing (minus having the compile-time option). > 2) The SSE code is part of a loadable sub-module, just like a plugin. > detection is done in the "main" code and SSE support functions are > loaded and registered dynamically. The only issue here is that there > is not much value to doing it this way, since SSE support level can be > detected and used dynamically anyway, and SSE is only available on the > intel architecture. On systems where SSE is not available at all, such > as ARM, the compile-time option should discard it completely. I agree with you that this seems like the wrong option. > I think the part which needs to be abstracted is the "SIMD" system. > The decoder should perform a check for available SIMD (SSE on intel, > NEON on ARM) and use it if available, but when compiling for either > intel or arm, only the relevant implementation should actually be > compiled in. I think it would make sense to put the SSE code in > separate files which are only included if FreeRDP is compiled for the > architecture which makes sense for it. An analogy could be made with > the way we currently deal with multiple cryptographic libraries. We > have one file for abstracting each crypto library for our use, and at > compile time only one of then is compiled, just like SSE would be > compiled for intel only and NEON for arm only. > > What would you think of option 1), done in a similar way to the > current cryptographic abstraction layer code? I have most of this done in my new fork (see above). There is a configure option (--with-sse2) that is enabled by default that controls adding the sse2 .c/.h files and defining WITH_SSE2. The _RFX_CONTEXT has a 'delegate' for the 1 method that has been optimized (so far). The rfx_context_new method will set the delegate to the default non-sse software routine, and then call rfx_sse2_init if WITH_SSE2 is defined. the rfx_sse2_init method then will check whether the CPU actually supports sse2 (future) and changes the delegate(s) to the sse optimized version(s). Eventually we could probably have the --with-sse2 config option smart enough to disable itself automatically based on which CPU arch was being compiled. On the code side I separated all the SSE code into files in a new 'sse2' directory under libfreerdp-rfx. What was done with SSE could easily be duplicated for NEON for the ARM architecture as well. Is this a good enough structure to be able to continue adding more optimizations? Thanks, Steve ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Freerdp-devel mailing list Freerdp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freerdp-devel