Hi Folks,

This patch series introduces a SSE3 implementation of Evas's common
engine blending routines.

Why SSE3?: 
The lddqu instruction, introduced in SSE3, is faster then a typical
unaligned load in the situation where we load from, but not store to,
an unaligned address which crosses a cache line. This yields itself well
to the blending functions which operate on two separate arrays. We single
step until we obtain an aligned address for the destination array, and use
lddqu to load the other unaligned array.

Why do we need an SSE implementation?:
GCC does perform some auto-vectorization, but misses a lot of
opportunities for leveraging SSE, specifically when operating on
packed integers, as opposed to floating-point. With GCC 4.6.0 and
the CFLAGS listed below, the c implementation isn't vectorized, and
the MMX implementation performance is suboptimal.

A few tests which demonstrate the performance impact:

Setup:
    Intel Atom N270, Intel 945GME, Expedite Xlib engine
    GCC 4.5.1  CFLAGS=-m32 -mtune=atom -O2 -msse3

Rect Blend:
    C:    21.80 FPS +/- 0.028674
    MMX:  27.41 FPS +/- 0.021344
    SSE3: 46.90 FPS +/- 0.376106

Image Blend Fade Unscaled:
    C:    15.46 FPS +/- 0.031314
    MMX:  24.92 FPS +/- 0.055902
    SSE3: 34.28 FPS +/- 0.099457

Image Blend Solid Fade Unscaled:
    C:    22.03 FPS +/- 0.097125
    MMX:  33.78 FPS +/- 0.190351
    SSE3: 46.86 FPS +/- 0.437874

Setup:
    Intel Atom N455, Intel GMA 3150, Expedite Xlib engine
    GCC 4.6.0 CFLAGS=-m32 -mtune=atom -O2 -msse3

Rect Blend:
    C:    32.68 FPS +/- 0.218510
    MMX:  29.75 FPS +/- 0.527105
    SSE3: 54.24 FPS +/- 0.870486

Image Blend Unscaled:
    C:    32.73 FPS +/- 0.359036
    MMX:  35.00 FPS +/- 1.099517
    SSE3: 50.93 FPS +/- 0.990806

Image Blend Occlude 3 Many:
    C:    24.25 FPS +/- 0.213135
    MMX:  25.87 FPS +/- 0.470124
    SSE3: 36.96 FPS +/- 0.505757

I'm sure there is further room for improvement.

Let me know what you guys think.

Thanks.



------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to