Hi Folks, This patch series introduces a SSE3 implementation of Evas's common engine blending routines.
Why SSE3?: The lddqu instruction, introduced in SSE3, is faster then a typical unaligned load in the situation where we load from, but not store to, an unaligned address which crosses a cache line. This yields itself well to the blending functions which operate on two separate arrays. We single step until we obtain an aligned address for the destination array, and use lddqu to load the other unaligned array. Why do we need an SSE implementation?: GCC does perform some auto-vectorization, but misses a lot of opportunities for leveraging SSE, specifically when operating on packed integers, as opposed to floating-point. With GCC 4.6.0 and the CFLAGS listed below, the c implementation isn't vectorized, and the MMX implementation performance is suboptimal. A few tests which demonstrate the performance impact: Setup: Intel Atom N270, Intel 945GME, Expedite Xlib engine GCC 4.5.1 CFLAGS=-m32 -mtune=atom -O2 -msse3 Rect Blend: C: 21.80 FPS +/- 0.028674 MMX: 27.41 FPS +/- 0.021344 SSE3: 46.90 FPS +/- 0.376106 Image Blend Fade Unscaled: C: 15.46 FPS +/- 0.031314 MMX: 24.92 FPS +/- 0.055902 SSE3: 34.28 FPS +/- 0.099457 Image Blend Solid Fade Unscaled: C: 22.03 FPS +/- 0.097125 MMX: 33.78 FPS +/- 0.190351 SSE3: 46.86 FPS +/- 0.437874 Setup: Intel Atom N455, Intel GMA 3150, Expedite Xlib engine GCC 4.6.0 CFLAGS=-m32 -mtune=atom -O2 -msse3 Rect Blend: C: 32.68 FPS +/- 0.218510 MMX: 29.75 FPS +/- 0.527105 SSE3: 54.24 FPS +/- 0.870486 Image Blend Unscaled: C: 32.73 FPS +/- 0.359036 MMX: 35.00 FPS +/- 1.099517 SSE3: 50.93 FPS +/- 0.990806 Image Blend Occlude 3 Many: C: 24.25 FPS +/- 0.213135 MMX: 25.87 FPS +/- 0.470124 SSE3: 36.96 FPS +/- 0.505757 I'm sure there is further room for improvement. Let me know what you guys think. Thanks. ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel