On Wednesday 09 February 2011 08:23:51 Siarhei Siamashka wrote:
> On Wednesday 09 February 2011 05:28:46 cooolheater wrote:
> > Thank you for your kind explanation.
> > I used pixman-0.21.4 for testing.
> > As you guessed, we are using SIMD and are finding method for NEON
> > acceleration.
> > Could you let me know the bilinear scaling interfaces in pixman and
> > where the SIMD optimization will be applied?
> 
> You can look here for the start:
> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-bits-image.c?id=pixma
> n-0.21.4#n189
> 
> But applying optimizations locally just for this small function is not
> going to provide the best performance, it's kind of like swinging a
> large polearm in a narrow passage is not so effective.

And here is an example of such patch attached. Performance improvement is not
impressive at all. Who cares if it's now let's say ~15x slower than nearest
scaling instead of ~30x?

Obviously we need a better solution.

-- 
Best regards,
Siarhei Siamashka
From f26b7505d90b80f53391fc3d22fbe2d8a6bc20f7 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamas...@nokia.com>
Date: Thu, 10 Feb 2011 02:33:18 +0200
Subject: [PATCH] HACK: ARM: quick tweak to add NEON optimizations for bilinear scaling

Does not support NEON runtime autodetection. Introduces minimal
changes to the code, but performance improvement is also very
far from what can be actually achieved (only roughly 2x faster
than C). Real optimization should take care of the whole scanline
and not just a single pixel, with proper loop unrolling and
instructions scheduling.
---
 pixman/pixman-bits-image.c |   47 +++++++++++++++++++++++++++++++++++++++----
 1 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/pixman/pixman-bits-image.c b/pixman/pixman-bits-image.c
index a865d71..c51ef27 100644
--- a/pixman/pixman-bits-image.c
+++ b/pixman/pixman-bits-image.c
@@ -231,6 +231,42 @@ bilinear_interpolation (uint32_t tl, uint32_t tr,
 
 #endif
 
+static force_inline void
+bilinear_interpolation_m (uint32_t * out,
+                          uint32_t   tl,
+                          uint32_t   tr,
+                          uint32_t   bl,
+                          uint32_t   br,
+                          int        distx,
+                          int        disty)
+{
+#ifdef __ARM_NEON__
+    asm volatile (
+        "vmov.32   d0[0], %[tl]\n"
+        "vmov.32   d1[0], %[bl]\n"
+        "vmov.32   d0[1], %[tr]\n"
+        "vmov.32   d1[1], %[br]\n"
+        "vshll.u8  q1, d0, #8\n"
+        "vdup.u8   d7, %[disty]\n"
+        "vdup.u16  d6, %[distx]\n"
+        "vmlsl.u8  q1, d0, d7\n"
+        "vmlal.u8  q1, d1, d7\n"
+        "vshll.u16 q0, d2, #8\n"
+        "vmlsl.u16 q0, d2, d6\n"
+        "vmlal.u16 q0, d3, d6\n"
+        "vshrn.u32 d0, q0, #16\n"
+        "vmovn.u16 d0, q0\n"
+        "vst1.32   {d0[0]}, [%[out], :32]\n"
+        :
+        : [tl] "r" (tl), [tr] "r" (tr), [bl] "r" (bl), [br] "r" (br),
+          [out] "r" (out), [disty] "r" (disty), [distx] "r" (distx)
+        : "memory", "d0", "d1", "d2", "d3", "d6", "d7"
+    );
+#else
+    *out = bilinear_interpolation (tl, tr, bl, br, distx, disty);
+#endif
+}
+
 static force_inline uint32_t
 bits_image_fetch_pixel_bilinear (bits_image_t   *image,
 				 pixman_fixed_t  x,
@@ -424,7 +460,8 @@ bits_image_fetch_bilinear_no_repeat_8888 (pixman_image_t * ima,
 
 	distx = (x >> 8) & 0xff;
 
-	*buffer++ = bilinear_interpolation (0, tr, 0, br, distx, disty);
+	bilinear_interpolation_m (buffer, 0, tr, 0, br, distx, disty);
+	buffer++;
 
 	x += ux;
 	x_top += ux_top;
@@ -449,7 +486,7 @@ bits_image_fetch_bilinear_no_repeat_8888 (pixman_image_t * ima,
 
 	    distx = (x >> 8) & 0xff;
 
-	    *buffer = bilinear_interpolation (tl, tr, bl, br, distx, disty);
+	    bilinear_interpolation_m (buffer, tl, tr, bl, br, distx, disty);
 	}
 
 	buffer++;
@@ -473,7 +510,7 @@ bits_image_fetch_bilinear_no_repeat_8888 (pixman_image_t * ima,
 
 	    distx = (x >> 8) & 0xff;
 
-	    *buffer = bilinear_interpolation (tl, 0, bl, 0, distx, disty);
+	    bilinear_interpolation_m (buffer, tl, 0, bl, 0, distx, disty);
 	}
 
 	buffer++;
@@ -895,8 +932,8 @@ bits_image_fetch_bilinear_affine (pixman_image_t * image,
 	    }
 	}
 
-	buffer[i] = bilinear_interpolation (
-	    tl, tr, bl, br, distx, disty);
+	bilinear_interpolation_m (
+	    &buffer[i], tl, tr, bl, br, distx, disty);
 
     next:
 	x += ux;
-- 
1.7.3.4

_______________________________________________
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Reply via email to