Hi Eric,
On Wed, Jun 8, 2011 at 2:16 AM, S. Erisman <seris...@serisman.com
<mailto:seris...@serisman.com>> wrote:
Marc,
I took your suggestions into account, revised my earlier patch,
and committed my changes to a new fork:
https://github.com/serisman/FreeRDP
... more comments below ...
On 6/7/2011 9:29 PM, Marc-André Moreau wrote:
Hi Steve,
Well, that was fast :) I had started thinking of the different
ways we can integrate this SSE acceleration within the rest in
a clean way. I see two major options:
1) The SSE code is part of the library, and can be disabled
with a compile-time option. Methods for SSE and non-SSE have
the same signature. At initialization, detection of SSE
support level is performed, and a callback is registered to
point to one of the methods (SSE or non-SSE).
This is basically what my original patch was doing (minus having
the compile-time option).
2) The SSE code is part of a loadable sub-module, just like a
plugin. detection is done in the "main" code and SSE support
functions are loaded and registered dynamically. The only
issue here is that there is not much value to doing it this
way, since SSE support level can be detected and used
dynamically anyway, and SSE is only available on the intel
architecture. On systems where SSE is not available at all,
such as ARM, the compile-time option should discard it completely.
I agree with you that this seems like the wrong option.
I think the part which needs to be abstracted is the "SIMD"
system. The decoder should perform a check for available SIMD
(SSE on intel, NEON on ARM) and use it if available, but when
compiling for either intel or arm, only the relevant
implementation should actually be compiled in. I think it
would make sense to put the SSE code in separate files which
are only included if FreeRDP is compiled for the architecture
which makes sense for it. An analogy could be made with the
way we currently deal with multiple cryptographic libraries.
We have one file for abstracting each crypto library for our
use, and at compile time only one of then is compiled, just
like SSE would be compiled for intel only and NEON for arm only.
What would you think of option 1), done in a similar way to
the current cryptographic abstraction layer code?
I have most of this done in my new fork (see above). There is a
configure option (--with-sse2) that is enabled by default that
controls adding the sse2 .c/.h files and defining WITH_SSE2. The
_RFX_CONTEXT has a 'delegate' for the 1 method that has been
optimized (so far). The rfx_context_new method will set the
delegate to the default non-sse software routine, and then call
rfx_sse2_init if WITH_SSE2 is defined. the rfx_sse2_init method
then will check whether the CPU actually supports sse2 (future)
and changes the delegate(s) to the sse optimized version(s).
Eventually we could probably have the --with-sse2 config option
smart enough to disable itself automatically based on which CPU
arch was being compiled. On the code side I separated all the SSE
code into files in a new 'sse2' directory under libfreerdp-rfx.
What was done with SSE could easily be duplicated for NEON for
the ARM architecture as well.
Is this a good enough structure to be able to continue adding more
optimizations?
I was about to implement similar changes, you're on the good track.
Yes, please keep going, I will likely improve on your code soon or I
will at least code review it before it gets merged.
I would suggest the following for detection:
Apparently there's a "cpuinfo" instruction which can be used to detect
which version of SSE is supported by the processor, if any. I think we
should wrap such a call to get the supported SSE level at run-time and
register the proper callback, instead of relying on a compile-time
option for enabling or disabling it.
Is SSE2 the best version of SSE available on your platform? I see
there's SSE3 and SSE4 as well, maybe we should implement SSE2, SSE3
and SSE4.
The structure seems pretty good so far and is quite suitable for
runtime-detection of supported instruction sets/features.
SSE3 and SSE4 only added a couple of new instructions to the SSE2
instruction set - I guess we'll have to check if any of them could
actually be of any use to us ... otherwise we can simply stick with SSE2
I guess ..
Thanks,
Steve
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Freerdp-devel mailing list
Freerdp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freerdp-devel
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Freerdp-devel mailing list
Freerdp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freerdp-devel