Re: [Pixman] [PATCH 1/4] bits: Implement PAD support in the simple fetcher

2013-01-30 Thread Jeff Muizelaar

On 2013-01-30, at 9:14 PM, Søren Sandmann wrote:

 
 Obviously this means that in this case, we're not getting the benefits of
 any platform-specific fast paths. Perhaps what we need is a pad
 equivalent of fast_composite_tiled_repeat()?
 
 This might be a good idea, though it's still worth finding out what is
 causing an untransformed image with PAD repeat to be drawn without a
 mask onto a destination bigger than source image. Either the resulting
 stripes are desirable for some reason, or Firefox should be more careful
 about clipping to the source image.

I've I had to guess, I'd guess that this is unintentional. There's been a bunch
of churn over the semantics drawImage with source rect larger than the image. 
At one
time, the correct behaviour was to use EXTEND_PAD however I believe
the currently spec'd behaviour is that the we should be clamping the source 
rect to the
size of the image. I'm not sure we've switched to this yet, but I'll take a 
look tomorrow.

-Jeff
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Add a version of bilinear_interpolation for precision =4

2013-01-28 Thread Jeff Muizelaar

On 2013-01-28, at 4:28 PM, Siarhei Siamashka wrote:

 On Thu, 24 Jan 2013 15:35:24 -0500
 Jeff Muizelaar jmuizel...@mozilla.com wrote:
 
 Having 4 or fewer bits means we can do two components at
 a time in a single 32 bit register.
 
 Here are the results for firefox-fishtank on a Pandaboard with
 4.6.3 and PIXMAN_DISABLE=arm-neon
 
 Before:
 [ # ]  backend test   min(s) median(s) stddev. count
 [  0]image   t-firefox-fishtank7.8417.910   0.70%6/6
 
 After:
 [ # ]  backend test   min(s) median(s) stddev. count
 [  0]image   t-firefox-fishtank6.9516.995   1.11%6/6
 
 That's pretty cool, thanks. I just wonder if you might be also possibly
 interested in a 4-bit bilinear scaling for SSE2 backend? It could
 provide quite a significant performance improvement.

Yes, definitely interested. We would ship 4-bit precision on desktop if it was 
noticeably faster.

-Jeff

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] 0.29.2

2013-01-27 Thread Jeff Muizelaar

On 2013-01-27, at 2:43 PM, Siarhei Siamashka wrote:

 I was a bit worried about the projective transforms. The accuracy is
 improved, but the new code uses a rather naive and slow implementation
 for long division. So the performance is going to be worse. But as
 almost nobody seems to be using projective transforms (not cairo at
 least) and the current implementation itself can be hardly considered
 optimized at all, this should not be a big problem.

FWIW, we currently use pixman's projective transforms in Firefox to implement 
CSS 3D transforms on
platforms without hardware acceleration. Making it slower isn't great.

-Jeff

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] Add a version of bilinear_interpolation for precision =4

2013-01-24 Thread Jeff Muizelaar
Having 4 or fewer bits means we can do two components at
a time in a single 32 bit register.

Here are the results for firefox-fishtank on a Pandaboard with
4.6.3 and PIXMAN_DISABLE=arm-neon

Before:
[ # ]  backend test   min(s) median(s) stddev. count
[  0]image   t-firefox-fishtank7.8417.910   0.70%6/6

After:
[ # ]  backend test   min(s) median(s) stddev. count
[  0]image   t-firefox-fishtank6.9516.995   1.11%6/6
---
 pixman/pixman-inlines.h | 37 +
 1 file changed, 37 insertions(+)

diff --git a/pixman/pixman-inlines.h b/pixman/pixman-inlines.h
index ab4def0..a03d967 100644
--- a/pixman/pixman-inlines.h
+++ b/pixman/pixman-inlines.h
@@ -88,6 +88,42 @@ pixman_fixed_to_bilinear_weight (pixman_fixed_t x)
   ((1  BILINEAR_INTERPOLATION_BITS) - 1);
 }
 
+#if BILINEAR_INTERPOLATION_BITS = 4
+/* Inspired by Filter_32_opaque from Skia */
+static force_inline uint32_t
+bilinear_interpolation(uint32_t tl, uint32_t tr,
+  uint32_t bl, uint32_t br,
+  int distx, int disty)
+{
+int distxy, distxiy, distixy, distixiy;
+uint32_t lo, hi;
+
+distx = (4 - BILINEAR_INTERPOLATION_BITS);
+disty = (4 - BILINEAR_INTERPOLATION_BITS);
+
+distxy = distx * disty;
+distxiy = (distx  4) - distxy;   /* distx * (16 - disty) */
+distixy = (disty  4) - distxy;   /* disty * (16 - distx) */
+distixiy =
+   16 * 16 - (disty  4) -
+   (distx  4) + distxy; /* (16 - distx) * (16 - disty) */
+
+lo = (tl  0xff00ff) * distixiy;
+hi = ((tl  8)  0xff00ff) * distixiy;
+
+lo += (tr  0xff00ff) * distxiy;
+hi += ((tr  8)  0xff00ff) * distxiy;
+
+lo += (bl  0xff00ff) * distixy;
+hi += ((bl  8)  0xff00ff) * distixy;
+
+lo += (br  0xff00ff) * distxy;
+hi += ((br  8)  0xff00ff) * distxy;
+
+return ((lo  8)  0xff00ff) | (hi  ~0xff00ff);
+}
+
+#else
 #if SIZEOF_LONG  4
 
 static force_inline uint32_t
@@ -184,6 +220,7 @@ bilinear_interpolation (uint32_t tl, uint32_t tr,
 }
 
 #endif
+#endif // BILINEAR_INTERPOLATION_BITS = 4
 
 /*
  * For each scanline fetched from source image with PAD repeat:
-- 
1.8.0.19.gf369db1

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] Add scaled nearest repeat fast paths

2012-06-25 Thread Jeff Muizelaar
Siarhei wrote this patch and we've been using it in the Mozilla tree since May.

Before this patch it was often faster to scale and repeat in two passes because 
each pass used a fast path vs. the slow path that the single pass approach 
takes. This makes it so that the single pass approach has competitive 
performance.



patch
Description: Binary data
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] Add fast paths for bilinear scaling

2012-06-25 Thread Jeff Muizelaar
This patch adds fast paths for bilinear scaling of (SRC, r5g6b5, r5g6b5), 
(OVER, a8r8g8b8, r5g6b5), and (OVER, a8r8g8b8, a8r8g8b8). These make a 
noticeable
improvement in the performance of Firefox on Android.

-Jeff
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Add fast paths for bilinear scaling

2012-06-25 Thread Jeff Muizelaar
And here's the patch.


patch
Description: Binary data


___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] sse2: Using MMX and SSE 4.1

2012-05-09 Thread Jeff Muizelaar

On 2012-05-09, at 12:57 PM, Søren Sandmann wrote:

 Matt Turner matts...@gmail.com writes:
 
 I started porting my src__0565 MMX function to SSE2, and in the
 process started thinking about using SSE3+. The useful instructions
 added post SSE2 that I see are
  SSE3:   lddqu - for unaligned loads across cache lines
 
 I don't really understand that instruction. Isn't it identical to
 movdqu?  Or is the idea that lddqu is faster than movdqu for cache line
 splits, but slower for plain old, non-cache split unaligned loads?

The instructions movdqu, movups, movupd and lddqu are all able to read 
unaligned vectors. lddqu is faster than the alternatives on P4E and PM 
processors, but requires the SSE3 instruction set. The unaligned read 
instructions are relatively slow on older processors, but faster on Nehalem, 
Sandy Bridge and on future AMD and Intel processors.

From http://www.agner.org/optimize/optimizing_assembly.pdf

-Jeff
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman