pixman: Changes to 'debian-unstable'

Andreas Boll Wed, 04 Nov 2015 04:52:52 -0800

 ChangeLog                      |  667 +++++++++++++++++++++++++++++++++++++++++
 configure.ac                   |   22 -
 debian/changelog               |    7 
 pixman/pixman-arm-simd-asm.S   |   41 ++
 pixman/pixman-arm-simd.c       |    6 
 pixman/pixman-general.c        |   18 -
 pixman/pixman-implementation.c |   16 
 pixman/pixman-mmx.c            |   64 ---
 pixman/pixman-vmx.c            |  492 ++++++++++++------------------
 pixman/pixman.c                |   17 -
 test/Makefile.sources          |    2 
 test/affine-bench.c            |   24 +
 test/cover-test.c              |  449 +++++++++++++++++++++++++++
 test/fence-image-self-test.c   |  239 ++++++++++++++
 test/lowlevel-blt-bench.c      |    6 
 test/scaling-test.c            |   66 ++--
 test/utils.c                   |  133 +++++++-
 test/utils.h                   |   21 +
 18 files changed, 1873 insertions(+), 417 deletions(-)


New commits:
commit 017a59ec26f3d70b577ddf868551f16198806f81
Author: Andreas Boll <[email protected]>
Date:   Wed Nov 4 13:26:38 2015 +0100

    Upload to unstable

diff --git a/debian/changelog b/debian/changelog
index be437ce..98410b4 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,8 +1,9 @@
-pixman (0.33.4-1) UNRELEASED; urgency=medium
+pixman (0.33.4-1) unstable; urgency=medium
 
+  * Team upload.
   * New upstream release candidate.
 
- -- Andreas Boll <[email protected]>  Wed, 04 Nov 2015 10:30:37 +0100
+ -- Andreas Boll <[email protected]>  Wed, 04 Nov 2015 13:26:18 +0100
 
 pixman (0.33.2-2) sid; urgency=medium
 

commit c19373008340e1ad159ded12e45275b5b06bb513
Author: Andreas Boll <[email protected]>
Date:   Wed Nov 4 10:30:58 2015 +0100

    Bump changelogs.

diff --git a/ChangeLog b/ChangeLog
index 96b8c28..9a56a72 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,670 @@
+commit fa71d08a81c9bf3f2366ee45474ff868d9e10b8e
+Author: Oded Gabbay <[email protected]>
+Date:   Fri Oct 23 17:58:49 2015 +0300
+
+    Pre-release version bump to 0.33.4
+    
+    Signed-off-by: Oded Gabbay <[email protected]>
+
+commit 9728241bd098bc4260e6cd83997dfecc64adc356
+Author: Andrea Canciani <[email protected]>
+Date:   Tue Oct 13 13:35:59 2015 +0200
+
+    test: Fix fence-image-self-test on Mac
+    
+    On MacOS X, according to the manpage of mprotect(), "When a program
+    violates the protections of a page, it gets a SIGBUS or SIGSEGV
+    signal.", but fence-image-self-test was only accepting a SIGSEGV as
+    notification of invalid access.
+    
+    Fixes fence-image-self-test
+    
+    Reviewed-by: Pekka Paalanen <[email protected]>
+
+commit 7de61d8d14e84623b6fa46506eb74f938287f536
+Author: Matt Turner <[email protected]>
+Date:   Sun Oct 11 14:44:46 2015 -0700
+
+    mmx: Use MMX2 intrinsics from xmmintrin.h directly.
+    
+    We had lots of hacks to handle the inability to include xmmintrin.h
+    without compiling with -msse (lest SSE instructions be used in
+    pixman-mmx.c). Some recent version of gcc relaxed this restriction.
+    
+    Change configure.ac to test that xmmintrin.h can be included and that we
+    can use some intrinsics from it, and remove the work-around code from
+    pixman-mmx.c.
+    
+    Evidently allows gcc 4.9.3 to optimize better as well:
+    
+       text       data     bss     dec     hex filename
+     657078      30848     680  688606   a81de libpixman-1.so.0.33.3 before
+     656710      30848     680  688238   a806e libpixman-1.so.0.33.3 after
+    
+    Reviewed-by: Siarhei Siamashka <[email protected]>
+    Tested-by: Pekka Paalanen <[email protected]>
+    Signed-off-by: Matt Turner <[email protected]>
+
+commit 90e62c086766afffd289a321c7de8ea4b5cac87d
+Author: Siarhei Siamashka <[email protected]>
+Date:   Fri Sep 4 15:39:00 2015 +0300
+
+    vmx: implement fast path vmx_composite_over_n_8888
+    
+    Running "lowlevel-blt-bench over_n_8888" on Playstation3 3.2GHz,
+    Gentoo ppc (32-bit userland) gave the following results:
+    
+    before:  over_n_8888 =  L1: 147.47  L2: 205.86  M:121.07
+    after:   over_n_8888 =  L1: 287.27  L2: 261.09  M:133.48
+    
+    Cairo non-trimmed benchmarks on POWER8, 3.4GHz 8 Cores:
+    
+    ocitysmap          659.69  -> 611.71   :  1.08x speedup
+    xfce4-terminal-a1  2725.22 -> 2547.47  :  1.07x speedup
+    
+    Signed-off-by: Siarhei Siamashka <[email protected]>
+    Signed-off-by: Oded Gabbay <[email protected]>
+
+commit 2876d8d3dd6a71cb9eb3ac93e5b9c18b71a452da
+Author: Ben Avison <[email protected]>
+Date:   Fri Sep 4 03:09:20 2015 +0100
+
+    affine-bench: remove 8e margin from COVER area
+    
+    Patch "Remove the 8e extra safety margin in COVER_CLIP analysis" reduced
+    the required image area for setting the COVER flags in
+    pixman.c:analyze_extent(). Do the same reduction in affine-bench.
+    
+    Leaving the old calculations in place would be very confusing for anyone
+    reading the code.
+    
+    Also add a comment that explains how affine-bench wants to hit the COVER
+    paths. This explains why the intricate extent calculations are copied
+    from pixman.c.
+    
+    [Pekka: split patch, change comments, write commit message]
+    Signed-off-by: Pekka Paalanen <[email protected]>
+    Reviewed-by: Ben Avison <[email protected]>
+
+commit 0e2e9751282b19280c92be4a80c5ae476bae0ce4
+Author: Ben Avison <[email protected]>
+Date:   Fri Sep 4 03:09:20 2015 +0100
+
+    Remove the 8e extra safety margin in COVER_CLIP analysis
+    
+    As discussed in
+    http://lists.freedesktop.org/archives/pixman/2015-August/003905.html
+    
+    the 8 * pixman_fixed_e (8e) adjustment which was applied to the transformed
+    coordinates is a legacy of rounding errors which used to occur in old
+    versions of Pixman, but which no longer apply. For any affine transform,
+    you are now guaranteed to get the same result by transforming the upper
+    coordinate as though you transform the lower coordinate and add (size-1)
+    steps of the increment in source coordinate space. No projective
+    transform routines use the COVER_CLIP flags, so they cannot be affected.
+    
+    Proof by Siarhei Siamashka:
+    
+    Let's take a look at the following affine transformation matrix (with 16.16
+    fixed point values) and two vectors:
+    
+             | a   b     c    |
+    M      = | d   e     f    |
+             | 0   0  0x10000 |
+    
+             |  x_dst  |
+    P     =  |  y_dst  |
+             | 0x10000 |
+    
+             | 0x10000 |
+    ONE_X  = |    0    |
+             |    0    |
+    
+    The current matrix multiplication code does the following calculations:
+    
+                 | (a * x_dst + b * y_dst + 0x8000) / 0x10000 + c |
+        M * P =  | (d * x_dst + e * y_dst + 0x8000) / 0x10000 + f |
+                 |                   0x10000                      |
+    
+    These calculations are not perfectly exact and we may get rounding
+    because the integer coordinates are adjusted by 0.5 (or 0x8000 in the
+    16.16 fixed point format) before doing matrix multiplication. For
+    example, if the 'a' coefficient is an odd number and 'b' is zero,
+    then we are losing some of the least significant bits when dividing by
+    0x10000.
+    
+    So we need to strictly prove that the following expression is always
+    true even though we have to deal with rounding:
+    
+                                              | a |
+        M * (P + ONE_X) - M * P = M * ONE_X = | d |
+                                              | 0 |
+    
+    or
+    
+       ((a * (x_dst + 0x10000) + b * y_dst + 0x8000) / 0x10000 + c)
+      -
+       ((a * x_dst             + b * y_dst + 0x8000) / 0x10000 + c)
+      =
+        a
+    
+    It's easy to see that this is equivalent to
+    
+        a + ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c)
+          - ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c)
+      =
+        a
+    
+    Which means that stepping exactly by one pixel horizontally in the
+    destination image space (advancing 'x_dst' by 0x10000) is the same as
+    changing the transformed 'x_src' coordinate in the source image space
+    exactly by 'a'. The same applies to the vertical direction too.
+    Repeating these steps, we can reach any pixel in the source image
+    space and get exactly the same fixed point coordinates as doing
+    matrix multiplications per each pixel.
+    
+    By the way, the older matrix multiplication implementation, which was
+    relying on less accurate calculations with three intermediate roundings
+    "((a + 0x8000) >> 16) + ((b + 0x8000) >> 16) + ((c + 0x8000) >> 16)",
+    also has the same properties. However reverting
+        
http://cgit.freedesktop.org/pixman/commit/?id=ed39992564beefe6b12f81e842caba11aff98a9c
+    and applying this "Remove the 8e extra safety margin in COVER_CLIP
+    analysis" patch makes the cover test fail. The real reason why it fails
+    is that the old pixman code was using "pixman_transform_point_3d()"
+    function
+        
http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n49
+    for getting the transformed coordinate of the top left corner pixel
+    in the image scaling code, but at the same time using a different
+    "pixman_transform_point()" function
+        
http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n82
+    in the extents calculation code for setting the cover flag. And these
+    functions did the intermediate rounding differently. That's why the 8e
+    safety margin was needed.
+    
+    ** proof ends
+    
+    However, for COVER_CLIP_NEAREST, the actual margins added were not 8e.
+    Because the half-way cases round down, that is, coordinate 0 hits pixel
+    index -1 while coordinate e hits pixel index 0, the extra safety margins
+    were actually 7e to the left and up, and 9e to the right and down. This
+    patch removes the 7e and 9e margins and restores the -e adjustment
+    required for NEAREST sampling in Pixman. For reference, see
+    pixman/rounding.txt.
+    
+    For COVER_CLIP_BILINEAR, the margins were exactly 8e as there are no
+    additional offsets to be restored, so simply removing the 8e additions
+    is enough.
+    
+    Proof:
+    
+    All implementations must give the same numerical results as
+    bits_image_fetch_pixel_nearest() / bits_image_fetch_pixel_bilinear().
+    
+    The former does
+        int x0 = pixman_fixed_to_int (x - pixman_fixed_e);
+    which maps directly to the new test for the nearest flag, when you consider
+    that x0 must fall in the interval [0,width).
+    
+    The latter does
+        x1 = x - pixman_fixed_1 / 2;
+        x1 = pixman_fixed_to_int (x1);
+        x2 = x1 + 1;
+    When you write a COVER path, you take advantage of the assumption that
+    both x1 and x2 fall in the interval [0, width).
+    
+    As samplers are allowed to fetch the pixel at x2 unconditionally, we
+    require
+        x1 >= 0
+        x2 < width
+    so
+        x - pixman_fixed_1 / 2 >= 0
+        x - pixman_fixed_1 / 2 + pixman_fixed_1 < width * pixman_fixed_1
+    so
+        pixman_fixed_to_int (x - pixman_fixed_1 / 2) >= 0
+        pixman_fixed_to_int (x + pixman_fixed_1 / 2) < width
+    which matches the source code lines for the bilinear case, once you delete
+    the lines that add the 8e margin.
+    
+    Signed-off-by: Ben Avison <[email protected]>
+    [Pekka: adjusted commit message, left affine-bench changes for another 
patch]
+    [Pekka: add commit message parts from Siarhei]
+    Signed-off-by: Pekka Paalanen <[email protected]>
+    Reviewed-by: Siarhei Siamashka <[email protected]>
+    Reviewed-by: Ben Avison <[email protected]>
+
+commit 23525b4ea5bc2dd67f8f65b90d023b6580ecbc36
+Author: Ben Avison <[email protected]>
+Date:   Tue Sep 22 12:43:25 2015 +0100
+
+    pixman-general: Tighten up calculation of temporary buffer sizes
+    
+    Each of the aligns can only add a maximum of 15 bytes to the space
+    requirement. This permits some edge cases to use the stack buffer where
+    previously it would have deduced that a heap buffer was required.
+    
+    Reviewed-by: Pekka Paalanen <[email protected]>
+
+commit 8b49d4b6b460d0c9299bca4ccddd7cd00d8f8441
+Author: Siarhei Siamashka <[email protected]>
+Date:   Tue Sep 22 04:25:40 2015 +0300
+
+    pixman-general: Fix stack related pointer arithmetic overflow
+    
+    As https://bugs.freedesktop.org/show_bug.cgi?id=92027#c6 explains,
+    the stack is allocated at the very top of the process address space
+    in some configurations (32-bit x86 systems with ASLR disabled).
+    And the careless computations done with the 'dest_buffer' pointer
+    may overflow, failing the buffer upper limit check.
+    
+    The problem can be reproduced using the 'stress-test' program,
+    which segfaults when executed via setarch:
+    
+        export CFLAGS="-O2 -m32" && ./autogen.sh
+        ./configure --disable-libpng --disable-gtk && make
+        setarch i686 -R test/stress-test
+    
+    This patch introduces the required corrections. The extra check
+    for negative 'width' may be redundant (the invalid 'width' value
+    is not supposed to reach here), but it's better to play safe
+    when dealing with the buffers allocated on stack.
+    
+    Reported-by: Ludovic Courtès <[email protected]>
+    Signed-off-by: Siarhei Siamashka <[email protected]>
+    Reviewed-by: [email protected]
+    Signed-off-by: Oded Gabbay <[email protected]>
+
+commit 4297e9058d252cac653723fe0b1bee559fbac3a4
+Author: Thomas Petazzoni <[email protected]>
+Date:   Thu Sep 17 15:43:27 2015 +0200
+
+    test: add a check for FE_DIVBYZERO
+    
+    Some architectures, such as Microblaze and Nios2, currently do not
+    implement FE_DIVBYZERO, even though they have <fenv.h> and
+    feenableexcept(). This commit adds a configure.ac check to verify
+    whether FE_DIVBYZERO is defined or not, and if not, disables the
+    problematic code in test/utils.c.
+    
+    Signed-off-by: Thomas Petazzoni <[email protected]>
+    Signed-off-by: Marek Vasut <[email protected]>
+    Acked-by: Siarhei Siamashka <[email protected]>
+    Signed-off-by: Oded Gabbay <[email protected]>
+
+commit 8189fad9610981d5b4dcd8f8980ff169110fb33c
+Author: Oded Gabbay <[email protected]>
+Date:   Sun Sep 6 11:45:20 2015 +0300
+
+    vmx: Remove unused expensive functions
+    
+    Now that we replaced the expensive functions with better performing
+    alternatives, we should remove them so they will not be used again.
+    
+    Running Cairo benchmark on trimmed traces gave the following results:
+    
+    POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le.
+    
+    Speedups
+    ========
+    t-firefox-scrolling     1232.30 -> 1096.55 :  1.12x
+    t-gnome-terminal-vim    613.86  -> 553.10  :  1.11x
+    t-evolution             405.54  -> 371.02  :  1.09x
+    t-firefox-talos-gfx     919.31  -> 862.27  :  1.07x
+    t-gvim                  653.02  -> 616.85  :  1.06x
+    t-firefox-canvas-alpha  941.29  -> 890.42  :  1.06x
+    
+    Signed-off-by: Oded Gabbay <[email protected]>
+    Acked-by: Pekka Paalanen <[email protected]>
+    Acked-by: Siarhei Siamashka <[email protected]>
+
+commit 6b1b8b2b90da11bf6101a151786b2a8c9f087338
+Author: Oded Gabbay <[email protected]>
+Date:   Sun Jun 28 13:17:41 2015 +0300
+
+    vmx: implement fast path vmx_composite_over_n_8_8888
+    
+    POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le.
+    
+    reference memcpy speed = 25008.9MB/s (6252.2MP/s for 32bpp fills)
+    
+                    Before         After           Change
+                  ---------------------------------------------
+    L1              91.32          182.84         +100.22%
+    L2              94.94          182.83         +92.57%
+    M               95.55          181.51         +89.96%
+    HT              88.96          162.09         +82.21%
+    VT              87.4           168.35         +92.62%
+    R               83.37          146.23         +75.40%
+    RT              66.4           91.5           +37.80%
+    Kops/s          683            859            +25.77%
+    
+    Signed-off-by: Oded Gabbay <[email protected]>
+    Acked-by: Pekka Paalanen <[email protected]>
+    Acked-by: Siarhei Siamashka <[email protected]>
+
+commit 8d8caa55a38c00351047d24322e23b201b6b29ff
+Author: Oded Gabbay <[email protected]>
+Date:   Sun Sep 6 11:46:15 2015 +0300
+
+    vmx: optimize vmx_composite_over_n_8888_8888_ca
+    
+    This patch optimizes vmx_composite_over_n_8888_8888_ca by removing use
+    of expand_alpha_1x128, unpack/pack and in_over_2x128 in favor of
+    splat_alpha, in_over and MUL/ADD macros from pixman_combine32.h.
+    
+    Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores,
+    3.4GHz, RHEL 7.2 ppc64le gave the following results:
+    
+    reference memcpy speed = 23475.4MB/s (5868.8MP/s for 32bpp fills)
+    
+                    Before          After           Change
+                  --------------------------------------------
+    L1              244.97          474.05         +93.51%
+    L2              243.74          473.05         +94.08%
+    M               243.29          467.16         +92.02%
+    HT              144.03          252.79         +75.51%
+    VT              174.24          279.03         +60.14%
+    R               109.86          149.98         +36.52%
+    RT              47.96           53.18          +10.88%
+    Kops/s          524             576            +9.92%
+    
+    Signed-off-by: Oded Gabbay <[email protected]>
+    Acked-by: Pekka Paalanen <[email protected]>
+    Acked-by: Siarhei Siamashka <[email protected]>
+
+commit 857880f0e4d1d42a8508ac77be33556cc6f7f546
+Author: Oded Gabbay <[email protected]>
+Date:   Sun Sep 6 10:58:30 2015 +0300
+
+    vmx: optimize scaled_nearest_scanline_vmx_8888_8888_OVER
+    
+    This patch optimizes scaled_nearest_scanline_vmx_8888_8888_OVER and all
+    the functions it calls (combine1, combine4 and
+    core_combine_over_u_pixel_vmx).
+    
+    The optimization is done by removing use of expand_alpha_1x128 and
+    expand_alpha_2x128 in favor of splat_alpha and MUL/ADD macros from
+    pixman_combine32.h.
+    
+    Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores,
+    3.4GHz, RHEL 7.2 ppc64le gave the following results:
+    
+    reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills)
+    
+                    Before          After           Change
+                  --------------------------------------------
+    L1              182.05          210.22         +15.47%
+    L2              180.6           208.92         +15.68%
+    M               180.52          208.22         +15.34%
+    HT              130.17          178.97         +37.49%
+    VT              145.82          184.22         +26.33%
+    R               104.51          129.38         +23.80%
+    RT              48.3            61.54          +27.41%
+    Kops/s          430             504            +17.21%
+    
+    v2: Check *pm is not NULL before dereferencing it in combine1()
+    
+    Signed-off-by: Oded Gabbay <[email protected]>
+    Acked-by: Pekka Paalanen <[email protected]>
+    Acked-by: Siarhei Siamashka <[email protected]>
+
+commit 73e586efb3ee149f76f15d9e549bffa15d8e30ec
+Author: Pekka Paalanen <[email protected]>
+Date:   Mon Sep 7 14:40:49 2015 +0300
+
+    armv6: enable over_n_8888
+    
+    Enable the fast path added in the previous patch by moving the lookup
+    table entries to their proper locations.
+    
+    Lowlevel-blt-bench benchmark statistics with 30 iterations, showing the
+    effect of adding this one patch on top of
+    "armv6: Add over_n_8888 fast path (disabled)", which was applied on
+    fd595692941f3d9ddea8934462bd1d18aed07c65.
+    
+           Before          After
+          Mean StdDev     Mean StdDev   Confidence   Change
+    L1    12.5   0.04     45.2   0.10    100.00%    +263.1%
+    L2    11.1   0.02     43.2   0.03    100.00%    +289.3%
+    M      9.4   0.00     42.4   0.02    100.00%    +351.7%
+    HT     8.5   0.02     25.4   0.10    100.00%    +198.8%
+    VT     8.4   0.02     22.3   0.07    100.00%    +167.0%
+    R      8.2   0.02     23.1   0.09    100.00%    +183.6%
+    RT     5.4   0.05     11.4   0.21    100.00%    +110.3%
+    
+    At most 3 outliers rejected per test per set.
+    
+    Iterating here means that lowlevel-blt-bench was executed 30 times, and
+    the statistics above were computed from the output.
+    
+    Signed-off-by: Pekka Paalanen <[email protected]>
+
+commit 9eb6889b15a180cc94aad8ac97189af5b3a68b96
+Author: Ben Avison <[email protected]>
+Date:   Mon Sep 7 14:40:48 2015 +0300
+
+    armv6: Add over_n_8888 fast path (disabled)
+    
+    This new fast path is initially disabled by putting the entries in the
+    lookup table after the sentinel. The compiler cannot tell the new code
+    is not used, so it cannot eliminate the code. Also the lookup table size
+    will include the new fast path. When the follow-up patch then enables
+    the new fast path, the binary layout (alignments, size, etc.) will stay
+    the same compared to the disabled case.
+    
+    Keeping the binary layout identical is important for benchmarking on
+    Raspberry Pi 1. The addresses at which functions are loaded will have a
+    significant impact on benchmark results, causing unexpected performance
+    changes. Keeping all function addresses the same across the patch
+    enabling a new fast path improves the reliability of benchmarks.
+    
+    Benchmark results are included in the patch enabling this fast path.
+    
+    [Pekka: disabled the fast path, commit message]
+    Signed-off-by: Pekka Paalanen <[email protected]>
+
+commit 4c71f595e3393be5b922df37d50d71dd83f4f979
+Author: Ben Avison <[email protected]>
+Date:   Wed Sep 2 20:35:59 2015 +0100
+
+    test: Add cover-test v5
+    
+    This test aims to verify both numerical correctness and the honouring of
+    array bounds for scaled plots (both nearest-neighbour and bilinear) at or
+    close to the boundary conditions for applicability of "cover" type fast 
paths
+    and iter fetch routines.
+    
+    It has a secondary purpose: by setting the env var EXACT (to any value) it
+    will only test plots that are exactly on the boundary condition. This makes
+    it possible to ensure that "cover" routines are being used to the maximum,
+    although this requires the use of a debugger or code instrumentation to
+    verify.
+    
+    Changes in v4:
+    
+      Check the fence page size and skip the test if it is too large. Since
+      we need to deal with pixman_fixed_t coordinates that go beyond the
+      real image width, make the page size limit 16 kB. A 32 kB or larger
+      page size would cause an a8 image width to be 32k or more, which is no
+      longer representable in pixman_fixed_t.
+    
+      Use a shorthand variable 'filter' in test_cover().
+    
+      Whitespace adjustments.
+    
+    Changes in v5:
+    
+      Skip if fenced memory is not supported. Do you know of any such
+      platform?
+    
+    Signed-off-by: Ben Avison <[email protected]>
+    [Pekka: changes in v4 and v5]
+    Signed-off-by: Pekka Paalanen <[email protected]>
+    Reviewed-by: Ben Avison <[email protected]>
+    Acked-by: Oded Gabbay <[email protected]>
+
+commit 812c9c9758e1503bd1725af9c6fe9ede6a467506
+Author: Pekka Paalanen <[email protected]>
+Date:   Tue Sep 8 13:35:33 2015 +0300
+
+    implementation: add PIXMAN_DISABLE=wholeops
+    
+    Add a new option to PIXMAN_DISABLE: "wholeops". This option disables all
+    whole-operation fast paths regardless of implementation level, except
+    the general path (general_composite_rect).
+    
+    The purpose is to add a debug option that allows us to test optimized
+    iterator paths specifically. With this, it is possible to see if:
+    - fast paths mask bugs in iterators
+    - compare fast paths with iterator paths for performance
+    
+    The effect was tested on x86_64 by running:
+    $ PIXMAN_DISABLE='' ./test/lowlevel-blt-bench over_8888_8888
+    $ PIXMAN_DISABLE='wholeops' ./test/lowlevel-blt-bench over_8888_8888
+    
+    In the first case time is spent in sse2_composite_over_8888_8888(), and
+    in the latter in sse2_combine_over_u().
+    
+    Signed-off-by: Pekka Paalanen <[email protected]>
+    Reviewed-by: Oded Gabbay <[email protected]>
+
+commit e9ef2cc4dea04792a03d604c075c344055765217
+Author: Pekka Paalanen <[email protected]>
+Date:   Tue Sep 8 09:36:48 2015 +0300
+
+    utils.[ch]: add fence_get_page_size()
+    
+    Add a function to get the page size used for memory fence purposes, and
+    use it everywhere where getpagesize() was used.
+    
+    This offers a single point in code to override the page size, in case
+    one wants to experiment how the tests work with a higher page size than
+    what the developer's machine has.
+    
+    This also offers a clean API, without adding #ifdefs, to tests for
+    checking the page size.
+    
+    Signed-off-by: Pekka Paalanen <[email protected]>
+    Reviewed-by: Oded Gabbay <[email protected]>
+    Reviewed-by: Ben Avison <[email protected]>
+
+commit 82f8c997dfd3f60a48134107ecf38663b464bdc9
+Author: Pekka Paalanen <[email protected]>
+Date:   Tue Sep 8 09:20:46 2015 +0300
+
+    utils.c: fix fallback code for fence_image_create_bits()
+    
+    Used a wrong variable name, causing:
+    /home/pq/git/pixman/demos/../test/utils.c: In function 
‘fence_image_create_bits’:
+    /home/pq/git/pixman/demos/../test/utils.c:562:46: error: ‘width’ 
undeclared (first use in this function)
+    
+    Use the correct variable.
+    
+    Signed-off-by: Pekka Paalanen <[email protected]>
+    Reviewed-by: Oded Gabbay <[email protected]>
+    Reviewed-by: Ben Avison <[email protected]>
+
+commit 07006853828a59b5e0cd7d7d058d03db4e23e6ec
+Author: Pekka Paalanen <[email protected]>
+Date:   Thu May 7 17:16:05 2015 +0300
+
+    test: add fence-image-self-test
+    
+    Tests that fence_malloc and fence_image_create_bits actually work: that
+    out-of-bounds and out-of-row (unused stride area) accesses trigger
+    SIGSEGV.
+    
+    If fence_malloc is a dummy (FENCE_MALLOC_ACTIVE not defined), this test
+    is skipped.
+    
+    Changes in v2:
+    
+    - check FENCE_MALLOC_ACTIVE value, not whether it is defined
+    - test that reading bytes near the fence pages does not cause a
+      segmentation fault
+    
+    Changes in v3:
+    
+    - Do not print progress messages unless VERBOSE environment variable is
+      set. Avoid spamming the terminal output of 'make check' on some
+      versions of autotools.
+    
+    Signed-off-by: Pekka Paalanen <[email protected]>
+    Reviewed-by: Ben Avison <[email protected]>
+
+commit 13d93aa12050ce99643d56b0c730404294f46c2f
+Author: Pekka Paalanen <[email protected]>
+Date:   Thu May 7 16:46:01 2015 +0300
+
+    utils.[ch]: add fence_image_create_bits ()
+    
+    Useful for detecting out-of-bounds accesses in composite operations.
+    
+    This will be used by follow-up patches adding new tests.
+    
+    Changes in v2:
+    
+    - fix style on fence_image_create_bits args
+    - add page to stride only if stride_fence
+    - add comment on the fallback definition about freeing storage
+    
+    Signed-off-by: Pekka Paalanen <[email protected]>
+    Reviewed-by: Ben Avison <[email protected]>
+
+commit c70ddd5c9e12d87ff461d73a6f53b00d52925cf5
+Author: Pekka Paalanen <[email protected]>
+Date:   Thu May 7 14:21:30 2015 +0300
+
+    utils.[ch]: add FENCE_MALLOC_ACTIVE
+    
+    Define a new token to simplify checking whether fence_malloc() actually
+    can catch out-of-bounds access.
+    
+    This will be used in the future to skip tests that rely on fence_malloc
+    checking functionality.
+    
+    Changes in v2:
+    
+    - #define FENCE_MALLOC_ACTIVE always, but change its value to help catch
+      use of it without including utils.h
+    
+    Signed-off-by: Pekka Paalanen <[email protected]>
+    Reviewed-by: Ben Avison <[email protected]>
+
+commit a82e519944e5d1af41cc94a14d9ae1fe0e430e68
+Author: Ben Avison <[email protected]>
+Date:   Thu Aug 20 13:07:48 2015 +0100
+
+    scaling-test: list more details when verbose
+    
+    Add mask details to the output.
+    
+    [Pekka: redo whitespace and print src,dst,mask x and y.]
+    Signed-off-by: Pekka Paalanen <[email protected]>
+    Reviewed-by: Ben Avison <[email protected]>
+
+commit fd595692941f3d9ddea8934462bd1d18aed07c65
+Author: Pekka Paalanen <[email protected]>
+Date:   Tue Jul 7 11:31:20 2015 +0300
+
+    lowlevel-blt-bench: make extra arguments an error
+    
+    If a user gives multiple patterns or extra arguments, only the last one
+    was used as the pattern while the former were just ignored. This is a
+    user error silently converted to something possibly unexpected.
+    
+    In presence of extra arguments, complain and quit.
+    
+    Cc: Ben Avison <[email protected]>
+    Signed-off-by: Pekka Paalanen <[email protected]>
+
+commit 69611473c5a4e7cc2e6016d82ff4ed28e289484a
+Author: Oded Gabbay <[email protected]>
+Date:   Sat Aug 1 23:01:43 2015 +0300
+
+    Post-release version bump to 0.33.3
+    
+    Signed-off-by: Oded Gabbay <[email protected]>
+
 commit ee790044b08e3b668e6aa5d9229f46ed7295ebf0
 Author: Oded Gabbay <[email protected]>
 Date:   Sat Aug 1 22:34:53 2015 +0300
diff --git a/debian/changelog b/debian/changelog
index 42e6d85..be437ce 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+pixman (0.33.4-1) UNRELEASED; urgency=medium
+
+  * New upstream release candidate.
+
+ -- Andreas Boll <[email protected]>  Wed, 04 Nov 2015 10:30:37 +0100
+
 pixman (0.33.2-2) sid; urgency=medium
 
   * Run tests with VERBOSE=1.

commit fa71d08a81c9bf3f2366ee45474ff868d9e10b8e
Author: Oded Gabbay <[email protected]>
Date:   Fri Oct 23 17:58:49 2015 +0300

    Pre-release version bump to 0.33.4
    
    Signed-off-by: Oded Gabbay <[email protected]>

diff --git a/configure.ac b/configure.ac
index b04cc69..dcacff1 100644
--- a/configure.ac
+++ b/configure.ac
@@ -54,7 +54,7 @@ AC_PREREQ([2.57])
 
 m4_define([pixman_major], 0)
 m4_define([pixman_minor], 33)
-m4_define([pixman_micro], 3)
+m4_define([pixman_micro], 4)
 
 m4_define([pixman_version],[pixman_major.pixman_minor.pixman_micro])
 

commit 9728241bd098bc4260e6cd83997dfecc64adc356
Author: Andrea Canciani <[email protected]>
Date:   Tue Oct 13 13:35:59 2015 +0200

    test: Fix fence-image-self-test on Mac
    
    On MacOS X, according to the manpage of mprotect(), "When a program
    violates the protections of a page, it gets a SIGBUS or SIGSEGV
    signal.", but fence-image-self-test was only accepting a SIGSEGV as
    notification of invalid access.
    
    Fixes fence-image-self-test
    
    Reviewed-by: Pekka Paalanen <[email protected]>

diff --git a/test/fence-image-self-test.c b/test/fence-image-self-test.c
index c883038..c80b3cf 100644
--- a/test/fence-image-self-test.c
+++ b/test/fence-image-self-test.c
@@ -73,7 +73,7 @@ prinfo (const char *fmt, ...)
 }
 
 static void
-do_expect_segv (void (*fn)(void *), void *data)
+do_expect_signal (void (*fn)(void *), void *data)
 {
     struct sigaction sa;
 
@@ -82,6 +82,8 @@ do_expect_segv (void (*fn)(void *), void *data)
     sa.sa_sigaction = segv_handler;
     if (sigaction (SIGSEGV, &sa, NULL) == -1)
         die ("sigaction failed", errno);
+    if (sigaction (SIGBUS, &sa, NULL) == -1)
+        die ("sigaction failed", errno);
 
     (*fn)(data);
 
@@ -96,7 +98,7 @@ do_expect_segv (void (*fn)(void *), void *data)
  * to exit with success, and return failure otherwise.
  */
 static pixman_bool_t
-expect_segv (void (*fn)(void *), void *data)
+expect_signal (void (*fn)(void *), void *data)
 {
     pid_t pid, wp;
     int status;
@@ -106,7 +108,7 @@ expect_segv (void (*fn)(void *), void *data)
         die ("fork failed", errno);
 
     if (pid == 0)
-        do_expect_segv (fn, data); /* never returns */
+        do_expect_signal (fn, data); /* never returns */
 
     wp = waitpid (pid, &status, 0);
     if (wp != pid)
@@ -131,9 +133,9 @@ test_read_fault (uint8_t *p, int offset)
 {
     prinfo ("*(uint8_t *)(%p + %d)", p, offset);
 
-    if (expect_segv (read_u8, p + offset))
+    if (expect_signal (read_u8, p + offset))
     {
-        prinfo ("\tSEGV OK\n");
+        prinfo ("\tsignal OK\n");
 
         return TRUE;
     }
diff --git a/test/utils.c b/test/utils.c
index 8657966..f8e42a5 100644
--- a/test/utils.c
+++ b/test/utils.c
@@ -471,9 +471,9 @@ fence_image_destroy (pixman_image_t *image, void *data)
  * min_width is only a minimum width for the image. The width is aligned up
  * for the row size to be divisible by both page size and pixel size.
  *
- * If stride_fence is true, the additional page on each row will be armed
- * to cause SIGSEVG on all accesses. This should catch all accesses outside
- * the valid row pixels.
+ * If stride_fence is true, the additional page on each row will be
+ * armed to cause SIGSEGV or SIGBUS on all accesses. This should catch
+ * all accesses outside the valid row pixels.
  */
 pixman_image_t *
 fence_image_create_bits (pixman_format_code_t format,

commit 7de61d8d14e84623b6fa46506eb74f938287f536
Author: Matt Turner <[email protected]>
Date:   Sun Oct 11 14:44:46 2015 -0700

    mmx: Use MMX2 intrinsics from xmmintrin.h directly.
    
    We had lots of hacks to handle the inability to include xmmintrin.h
    without compiling with -msse (lest SSE instructions be used in
    pixman-mmx.c). Some recent version of gcc relaxed this restriction.
    
    Change configure.ac to test that xmmintrin.h can be included and that we
    can use some intrinsics from it, and remove the work-around code from
    pixman-mmx.c.
    
    Evidently allows gcc 4.9.3 to optimize better as well:
    
       text        data     bss     dec     hex filename
     657078       30848     680  688606   a81de libpixman-1.so.0.33.3 before
     656710       30848     680  688238   a806e libpixman-1.so.0.33.3 after
    
    Reviewed-by: Siarhei Siamashka <[email protected]>
    Tested-by: Pekka Paalanen <[email protected]>
    Signed-off-by: Matt Turner <[email protected]>

diff --git a/configure.ac b/configure.ac
index 424bfd3..b04cc69 100644
--- a/configure.ac
+++ b/configure.ac
@@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
 #error "Need GCC >= 3.4 for MMX intrinsics"
 #endif
 #include <mmintrin.h>
+#include <xmmintrin.h>
 int main () {
     __m64 v = _mm_cvtsi32_si64 (1);
     __m64 w;
 
-    /* Some versions of clang will choke on K */
-    asm ("pshufw %2, %1, %0\n\t"
-        : "=y" (w)
-        : "y" (v), "K" (5)
-    );
-
-    /* Some versions of clang will choke on this */
-    asm ("pmulhuw %1, %0\n\t"
-       : "+y" (w)
-       : "y" (v)
-    );
+    /* Test some intrinsics from xmmintrin.h */
+    w = _mm_shuffle_pi16(v, 5);
+    w = _mm_mulhi_pu16(w, w);
 
     return _mm_cvtsi64_si32 (v);
 }]])], have_mmx_intrinsics=yes)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 05c48a4..88c3a39 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -40,6 +40,9 @@
 #else
 #include <mmintrin.h>
 #endif
+#ifdef USE_X86_MMX
+#include <xmmintrin.h>
+#endif
 #include "pixman-private.h"
 #include "pixman-combine32.h"
 #include "pixman-inlines.h"
@@ -59,66 +62,7 @@ _mm_empty (void)
 }
 #endif
 
-#ifdef USE_X86_MMX
-# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64))
-#  include <xmmintrin.h>
-# else
-/* We have to compile with -msse to use xmmintrin.h, but that causes SSE
- * instructions to be generated that we don't want. Just duplicate the
- * functions we want to use.  */
-extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_movemask_pi8 (__m64 __A)
-{
-    int ret;
-
-    asm ("pmovmskb %1, %0\n\t"
-       : "=r" (ret)
-       : "y" (__A)
-    );
-
-    return ret;
-}
-
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_mulhi_pu16 (__m64 __A, __m64 __B)
-{
-    asm ("pmulhuw %1, %0\n\t"
-       : "+y" (__A)
-       : "y" (__B)
-    );
-    return __A;
-}
-
-#  ifdef __OPTIMIZE__
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_shuffle_pi16 (__m64 __A, int8_t const __N)
-{
-    __m64 ret;
-
-    asm ("pshufw %2, %1, %0\n\t"
-       : "=y" (ret)
-       : "y" (__A), "K" (__N)
-    );
-
-    return ret;
-}
-#  else
-#   define _mm_shuffle_pi16(A, N)                                      \
-    ({                                                                 \
-       __m64 ret;                                                      \
-                                                                       \
-       asm ("pshufw %2, %1, %0\n\t"                                    \
-            : "=y" (ret)                                               \
-            : "y" (A), "K" ((const int8_t)N)                           \
-       );                                                              \
-                                                                       \
-       ret;                                                            \
-    })
-#  endif
-# endif
-#endif
-
-#ifndef _MSC_VER
+#ifndef _MM_SHUFFLE
 #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \
  (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0))
 #endif

commit 90e62c086766afffd289a321c7de8ea4b5cac87d
Author: Siarhei Siamashka <[email protected]>
Date:   Fri Sep 4 15:39:00 2015 +0300

    vmx: implement fast path vmx_composite_over_n_8888
    
    Running "lowlevel-blt-bench over_n_8888" on Playstation3 3.2GHz,
    Gentoo ppc (32-bit userland) gave the following results:
    
    before:  over_n_8888 =  L1: 147.47  L2: 205.86  M:121.07
    after:   over_n_8888 =  L1: 287.27  L2: 261.09  M:133.48
    
    Cairo non-trimmed benchmarks on POWER8, 3.4GHz 8 Cores:
    
    ocitysmap          659.69  -> 611.71   :  1.08x speedup
    xfce4-terminal-a1  2725.22 -> 2547.47  :  1.07x speedup

pixman: Changes to 'debian-unstable'

Reply via email to