On Thu, 6 Dec 2012 19:45:44 +0200 Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote:
> ARMv6 has UQADD8 instruction, which implements unsigned saturated > addition for 8-bit values packed in 32-bit registers. It is very useful > for UN8x4_ADD_UN8x4, UN8_rb_ADD_UN8_rb and ADD_UN8 macros (which would > otherwise need a lot of arithmetic operations to simulate this operation). > Since most of the major ARM linux distros are built for ARMv7, we are > much less dependent on runtime CPU detection and can get practical > benefits from conditional compilation here for a lot of users. > > The results of cairo-perf-trace benchmark on ARM Cortex-A15 with pixman > compiled by gcc 4.7.2 and PIXMAN_DISABLE set to "arm-simd arm-neon": > > Speedups > ======== > image firefox-talos-gfx (29938.22 0.12%) -> (27814.76 0.51%) : 1.08x > speedup > image firefox-asteroids (23241.11 0.07%) -> (21795.19 0.07%) : 1.07x > speedup > image firefox-canvas-alpha (174519.85 0.08%) -> (164788.64 0.20%) : 1.06x > speedup > image poppler (9464.46 1.61%) -> (8991.53 0.14%) : 1.05x > speedup > --- > pixman/pixman-combine32.h | 47 > +++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 47 insertions(+), 0 deletions(-) Forgot to mention, the benchmark numbers above assume that the patch for faster combine_over_u has been already applied to pixman: http://lists.freedesktop.org/archives/pixman/2012-November/002384.html If we apply only this UQADD8 patch alone and compare the performance with the current pixman git, we get: Speedups ======== image firefox-paintball (622686.54 0.03%) -> (566993.97 0.10%) : 1.10x speedup image chromium-tabs (737.67 0.12%) -> (682.36 0.27%) : 1.08x speedup image firefox-fishtank (513843.85 0.06%) -> (479705.23 0.12%) : 1.07x speedup image firefox-talos-gfx (29954.45 0.18%) -> (28382.82 0.55%) : 1.07x speedup image firefox-asteroids (24591.65 0.14%) -> (23239.72 0.10%) : 1.06x speedup image firefox-canvas-alpha (190829.98 0.08%) -> (180617.98 0.05%) : 1.06x speedup image poppler (9484.97 0.06%) -> (8998.34 0.06%) : 1.06x speedup image firefox-fishbowl (421040.07 0.06%) -> (400184.18 0.15%) : 1.05x speedup image firefox-canvas (90428.10 0.06%) -> (86074.26 0.11%) : 1.05x speedup If we apply both UQADD8 and combine_over_u patches and compare the performance with the current pixman git: Speedups ======== image firefox-paintball (622686.54 0.03%) -> (426471.59 0.03%) : 1.46x speedup image firefox-fishtank (513843.85 0.06%) -> (375270.57 0.12%) : 1.37x speedup image firefox-canvas (90428.10 0.06%) -> (67424.41 0.02%) : 1.34x speedup image firefox-fishbowl (421040.07 0.06%) -> (356533.95 0.11%) : 1.18x speedup image firefox-talos-svg (127530.61 0.02%) -> (108182.31 0.10%) : 1.18x speedup image firefox-canvas-alpha (190829.98 0.08%) -> (164788.64 0.20%) : 1.16x speedup image firefox-asteroids (24591.65 0.14%) -> (21795.19 0.07%) : 1.13x speedup image firefox-particles (214047.38 0.10%) -> (194802.61 0.03%) : 1.10x speedup image swfdec-youtube (8437.75 2.06%) -> (7692.13 0.63%) : 1.10x speedup image chromium-tabs (737.67 0.12%) -> (681.24 0.24%) : 1.08x speedup image firefox-talos-gfx (29954.45 0.18%) -> (27814.76 0.51%) : 1.08x speedup image firefox-chalkboard (156512.01 0.08%) -> (147481.59 0.12%) : 1.06x speedup image poppler (9484.97 0.06%) -> (8991.53 0.14%) : 1.06x speedup UQADD8 patch helps for the translucent cases. And combine_over_u patch helps for the transparent and opaque cases (alpha is 0x00 or 0xFF). They both work quite nice together. -- Best regards, Siarhei Siamashka _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman