Hi Eric,

On Wed, Jun 8, 2011 at 2:16 AM, S. Erisman <seris...@serisman.com <mailto:seris...@serisman.com>> wrote:

    Marc,

    I took your suggestions into account, revised my earlier patch,
    and committed my changes to a new fork:
    https://github.com/serisman/FreeRDP

    ... more comments below ...


    On 6/7/2011 9:29 PM, Marc-André Moreau wrote:

        Hi Steve,

        Well, that was fast :) I had started thinking of the different
        ways we can integrate this SSE acceleration within the rest in
        a clean way. I see two major options:

        1) The SSE code is part of the library, and can be disabled
        with a compile-time option. Methods for SSE and non-SSE have
        the same signature. At initialization, detection of SSE
        support level is performed, and a callback is registered to
        point to one of the methods (SSE or non-SSE).

    This is basically what my original patch was doing (minus having
    the compile-time option).


        2) The SSE code is part of a loadable sub-module, just like a
        plugin. detection is done in the "main" code and SSE support
        functions are loaded and registered dynamically. The only
        issue here is that there is not much value to doing it this
        way, since SSE support level can be detected and used
        dynamically anyway, and SSE is only available on the intel
        architecture. On systems where SSE is not available at all,
        such as ARM, the compile-time option should discard it completely.

    I agree with you that this seems like the wrong option.


        I think the part which needs to be abstracted is the "SIMD"
        system. The decoder should perform a check for available SIMD
        (SSE on intel, NEON on ARM) and use it if available, but when
        compiling for either intel or arm, only the relevant
        implementation should actually be compiled in. I think it
        would make sense to put the SSE code in separate files which
        are only included if FreeRDP is compiled for the architecture
        which makes sense for it. An analogy could be made with the
        way we currently deal with multiple cryptographic libraries.
        We have one file for abstracting each crypto library for our
        use, and at compile time only one of then is compiled, just
        like SSE would be compiled for intel only and NEON for arm only.

        What would you think of option 1), done in a similar way to
        the current cryptographic abstraction layer code?

    I have most of this done in my new fork (see above).  There is a
    configure option (--with-sse2) that is enabled by default that
    controls adding the sse2 .c/.h files and defining WITH_SSE2.  The
    _RFX_CONTEXT has a 'delegate' for the 1 method that has been
    optimized (so far).  The rfx_context_new method will set the
    delegate to the default non-sse software routine, and then call
    rfx_sse2_init if WITH_SSE2 is defined.  the rfx_sse2_init method
    then will check whether the CPU actually supports sse2 (future)
    and changes the delegate(s) to the sse optimized version(s).
    Eventually we could probably have the --with-sse2 config option
    smart enough to disable itself automatically based on which CPU
    arch was being compiled.  On the code side I separated all the SSE
    code into files in a new 'sse2' directory under libfreerdp-rfx.
     What was done with SSE could easily be duplicated for NEON for
    the ARM architecture as well.

    Is this a good enough structure to be able to continue adding more
    optimizations?


I was about to implement similar changes, you're on the good track. Yes, please keep going, I will likely improve on your code soon or I will at least code review it before it gets merged.

I would suggest the following for detection:

Apparently there's a "cpuinfo" instruction which can be used to detect which version of SSE is supported by the processor, if any. I think we should wrap such a call to get the supported SSE level at run-time and register the proper callback, instead of relying on a compile-time option for enabling or disabling it.

Is SSE2 the best version of SSE available on your platform? I see there's SSE3 and SSE4 as well, maybe we should implement SSE2, SSE3 and SSE4.
The structure seems pretty good so far and is quite suitable for runtime-detection of supported instruction sets/features.

SSE3 and SSE4 only added a couple of new instructions to the SSE2 instruction set - I guess we'll have to check if any of them could actually be of any use to us ... otherwise we can simply stick with SSE2 I guess ..


    Thanks,
     Steve



------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev


_______________________________________________
Freerdp-devel mailing list
Freerdp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freerdp-devel

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Freerdp-devel mailing list
Freerdp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freerdp-devel

Reply via email to