ChangeLog | 667 +++++++++++++++++++++++++++++++++++++++++ configure.ac | 22 - debian/changelog | 7 pixman/pixman-arm-simd-asm.S | 41 ++ pixman/pixman-arm-simd.c | 6 pixman/pixman-general.c | 18 - pixman/pixman-implementation.c | 16 pixman/pixman-mmx.c | 64 --- pixman/pixman-vmx.c | 492 ++++++++++++------------------ pixman/pixman.c | 17 - test/Makefile.sources | 2 test/affine-bench.c | 24 + test/cover-test.c | 449 +++++++++++++++++++++++++++ test/fence-image-self-test.c | 239 ++++++++++++++ test/lowlevel-blt-bench.c | 6 test/scaling-test.c | 66 ++-- test/utils.c | 133 +++++++- test/utils.h | 21 + 18 files changed, 1873 insertions(+), 417 deletions(-)
New commits: commit 017a59ec26f3d70b577ddf868551f16198806f81 Author: Andreas Boll <[email protected]> Date: Wed Nov 4 13:26:38 2015 +0100 Upload to unstable diff --git a/debian/changelog b/debian/changelog index be437ce..98410b4 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,8 +1,9 @@ -pixman (0.33.4-1) UNRELEASED; urgency=medium +pixman (0.33.4-1) unstable; urgency=medium + * Team upload. * New upstream release candidate. - -- Andreas Boll <[email protected]> Wed, 04 Nov 2015 10:30:37 +0100 + -- Andreas Boll <[email protected]> Wed, 04 Nov 2015 13:26:18 +0100 pixman (0.33.2-2) sid; urgency=medium commit c19373008340e1ad159ded12e45275b5b06bb513 Author: Andreas Boll <[email protected]> Date: Wed Nov 4 10:30:58 2015 +0100 Bump changelogs. diff --git a/ChangeLog b/ChangeLog index 96b8c28..9a56a72 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,670 @@ +commit fa71d08a81c9bf3f2366ee45474ff868d9e10b8e +Author: Oded Gabbay <[email protected]> +Date: Fri Oct 23 17:58:49 2015 +0300 + + Pre-release version bump to 0.33.4 + + Signed-off-by: Oded Gabbay <[email protected]> + +commit 9728241bd098bc4260e6cd83997dfecc64adc356 +Author: Andrea Canciani <[email protected]> +Date: Tue Oct 13 13:35:59 2015 +0200 + + test: Fix fence-image-self-test on Mac + + On MacOS X, according to the manpage of mprotect(), "When a program + violates the protections of a page, it gets a SIGBUS or SIGSEGV + signal.", but fence-image-self-test was only accepting a SIGSEGV as + notification of invalid access. + + Fixes fence-image-self-test + + Reviewed-by: Pekka Paalanen <[email protected]> + +commit 7de61d8d14e84623b6fa46506eb74f938287f536 +Author: Matt Turner <[email protected]> +Date: Sun Oct 11 14:44:46 2015 -0700 + + mmx: Use MMX2 intrinsics from xmmintrin.h directly. + + We had lots of hacks to handle the inability to include xmmintrin.h + without compiling with -msse (lest SSE instructions be used in + pixman-mmx.c). Some recent version of gcc relaxed this restriction. + + Change configure.ac to test that xmmintrin.h can be included and that we + can use some intrinsics from it, and remove the work-around code from + pixman-mmx.c. + + Evidently allows gcc 4.9.3 to optimize better as well: + + text data bss dec hex filename + 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before + 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after + + Reviewed-by: Siarhei Siamashka <[email protected]> + Tested-by: Pekka Paalanen <[email protected]> + Signed-off-by: Matt Turner <[email protected]> + +commit 90e62c086766afffd289a321c7de8ea4b5cac87d +Author: Siarhei Siamashka <[email protected]> +Date: Fri Sep 4 15:39:00 2015 +0300 + + vmx: implement fast path vmx_composite_over_n_8888 + + Running "lowlevel-blt-bench over_n_8888" on Playstation3 3.2GHz, + Gentoo ppc (32-bit userland) gave the following results: + + before: over_n_8888 = L1: 147.47 L2: 205.86 M:121.07 + after: over_n_8888 = L1: 287.27 L2: 261.09 M:133.48 + + Cairo non-trimmed benchmarks on POWER8, 3.4GHz 8 Cores: + + ocitysmap 659.69 -> 611.71 : 1.08x speedup + xfce4-terminal-a1 2725.22 -> 2547.47 : 1.07x speedup + + Signed-off-by: Siarhei Siamashka <[email protected]> + Signed-off-by: Oded Gabbay <[email protected]> + +commit 2876d8d3dd6a71cb9eb3ac93e5b9c18b71a452da +Author: Ben Avison <[email protected]> +Date: Fri Sep 4 03:09:20 2015 +0100 + + affine-bench: remove 8e margin from COVER area + + Patch "Remove the 8e extra safety margin in COVER_CLIP analysis" reduced + the required image area for setting the COVER flags in + pixman.c:analyze_extent(). Do the same reduction in affine-bench. + + Leaving the old calculations in place would be very confusing for anyone + reading the code. + + Also add a comment that explains how affine-bench wants to hit the COVER + paths. This explains why the intricate extent calculations are copied + from pixman.c. + + [Pekka: split patch, change comments, write commit message] + Signed-off-by: Pekka Paalanen <[email protected]> + Reviewed-by: Ben Avison <[email protected]> + +commit 0e2e9751282b19280c92be4a80c5ae476bae0ce4 +Author: Ben Avison <[email protected]> +Date: Fri Sep 4 03:09:20 2015 +0100 + + Remove the 8e extra safety margin in COVER_CLIP analysis + + As discussed in + http://lists.freedesktop.org/archives/pixman/2015-August/003905.html + + the 8 * pixman_fixed_e (8e) adjustment which was applied to the transformed + coordinates is a legacy of rounding errors which used to occur in old + versions of Pixman, but which no longer apply. For any affine transform, + you are now guaranteed to get the same result by transforming the upper + coordinate as though you transform the lower coordinate and add (size-1) + steps of the increment in source coordinate space. No projective + transform routines use the COVER_CLIP flags, so they cannot be affected. + + Proof by Siarhei Siamashka: + + Let's take a look at the following affine transformation matrix (with 16.16 + fixed point values) and two vectors: + + | a b c | + M = | d e f | + | 0 0 0x10000 | + + | x_dst | + P = | y_dst | + | 0x10000 | + + | 0x10000 | + ONE_X = | 0 | + | 0 | + + The current matrix multiplication code does the following calculations: + + | (a * x_dst + b * y_dst + 0x8000) / 0x10000 + c | + M * P = | (d * x_dst + e * y_dst + 0x8000) / 0x10000 + f | + | 0x10000 | + + These calculations are not perfectly exact and we may get rounding + because the integer coordinates are adjusted by 0.5 (or 0x8000 in the + 16.16 fixed point format) before doing matrix multiplication. For + example, if the 'a' coefficient is an odd number and 'b' is zero, + then we are losing some of the least significant bits when dividing by + 0x10000. + + So we need to strictly prove that the following expression is always + true even though we have to deal with rounding: + + | a | + M * (P + ONE_X) - M * P = M * ONE_X = | d | + | 0 | + + or + + ((a * (x_dst + 0x10000) + b * y_dst + 0x8000) / 0x10000 + c) + - + ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c) + = + a + + It's easy to see that this is equivalent to + + a + ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c) + - ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c) + = + a + + Which means that stepping exactly by one pixel horizontally in the + destination image space (advancing 'x_dst' by 0x10000) is the same as + changing the transformed 'x_src' coordinate in the source image space + exactly by 'a'. The same applies to the vertical direction too. + Repeating these steps, we can reach any pixel in the source image + space and get exactly the same fixed point coordinates as doing + matrix multiplications per each pixel. + + By the way, the older matrix multiplication implementation, which was + relying on less accurate calculations with three intermediate roundings + "((a + 0x8000) >> 16) + ((b + 0x8000) >> 16) + ((c + 0x8000) >> 16)", + also has the same properties. However reverting + http://cgit.freedesktop.org/pixman/commit/?id=ed39992564beefe6b12f81e842caba11aff98a9c + and applying this "Remove the 8e extra safety margin in COVER_CLIP + analysis" patch makes the cover test fail. The real reason why it fails + is that the old pixman code was using "pixman_transform_point_3d()" + function + http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n49 + for getting the transformed coordinate of the top left corner pixel + in the image scaling code, but at the same time using a different + "pixman_transform_point()" function + http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n82 + in the extents calculation code for setting the cover flag. And these + functions did the intermediate rounding differently. That's why the 8e + safety margin was needed. + + ** proof ends + + However, for COVER_CLIP_NEAREST, the actual margins added were not 8e. + Because the half-way cases round down, that is, coordinate 0 hits pixel + index -1 while coordinate e hits pixel index 0, the extra safety margins + were actually 7e to the left and up, and 9e to the right and down. This + patch removes the 7e and 9e margins and restores the -e adjustment + required for NEAREST sampling in Pixman. For reference, see + pixman/rounding.txt. + + For COVER_CLIP_BILINEAR, the margins were exactly 8e as there are no + additional offsets to be restored, so simply removing the 8e additions + is enough. + + Proof: + + All implementations must give the same numerical results as + bits_image_fetch_pixel_nearest() / bits_image_fetch_pixel_bilinear(). + + The former does + int x0 = pixman_fixed_to_int (x - pixman_fixed_e); + which maps directly to the new test for the nearest flag, when you consider + that x0 must fall in the interval [0,width). + + The latter does + x1 = x - pixman_fixed_1 / 2; + x1 = pixman_fixed_to_int (x1); + x2 = x1 + 1; + When you write a COVER path, you take advantage of the assumption that + both x1 and x2 fall in the interval [0, width). + + As samplers are allowed to fetch the pixel at x2 unconditionally, we + require + x1 >= 0 + x2 < width + so + x - pixman_fixed_1 / 2 >= 0 + x - pixman_fixed_1 / 2 + pixman_fixed_1 < width * pixman_fixed_1 + so + pixman_fixed_to_int (x - pixman_fixed_1 / 2) >= 0 + pixman_fixed_to_int (x + pixman_fixed_1 / 2) < width + which matches the source code lines for the bilinear case, once you delete + the lines that add the 8e margin. + + Signed-off-by: Ben Avison <[email protected]> + [Pekka: adjusted commit message, left affine-bench changes for another patch] + [Pekka: add commit message parts from Siarhei] + Signed-off-by: Pekka Paalanen <[email protected]> + Reviewed-by: Siarhei Siamashka <[email protected]> + Reviewed-by: Ben Avison <[email protected]> + +commit 23525b4ea5bc2dd67f8f65b90d023b6580ecbc36 +Author: Ben Avison <[email protected]> +Date: Tue Sep 22 12:43:25 2015 +0100 + + pixman-general: Tighten up calculation of temporary buffer sizes + + Each of the aligns can only add a maximum of 15 bytes to the space + requirement. This permits some edge cases to use the stack buffer where + previously it would have deduced that a heap buffer was required. + + Reviewed-by: Pekka Paalanen <[email protected]> + +commit 8b49d4b6b460d0c9299bca4ccddd7cd00d8f8441 +Author: Siarhei Siamashka <[email protected]> +Date: Tue Sep 22 04:25:40 2015 +0300 + + pixman-general: Fix stack related pointer arithmetic overflow + + As https://bugs.freedesktop.org/show_bug.cgi?id=92027#c6 explains, + the stack is allocated at the very top of the process address space + in some configurations (32-bit x86 systems with ASLR disabled). + And the careless computations done with the 'dest_buffer' pointer + may overflow, failing the buffer upper limit check. + + The problem can be reproduced using the 'stress-test' program, + which segfaults when executed via setarch: + + export CFLAGS="-O2 -m32" && ./autogen.sh + ./configure --disable-libpng --disable-gtk && make + setarch i686 -R test/stress-test + + This patch introduces the required corrections. The extra check + for negative 'width' may be redundant (the invalid 'width' value + is not supposed to reach here), but it's better to play safe + when dealing with the buffers allocated on stack. + + Reported-by: Ludovic Courtès <[email protected]> + Signed-off-by: Siarhei Siamashka <[email protected]> + Reviewed-by: [email protected] + Signed-off-by: Oded Gabbay <[email protected]> + +commit 4297e9058d252cac653723fe0b1bee559fbac3a4 +Author: Thomas Petazzoni <[email protected]> +Date: Thu Sep 17 15:43:27 2015 +0200 + + test: add a check for FE_DIVBYZERO + + Some architectures, such as Microblaze and Nios2, currently do not + implement FE_DIVBYZERO, even though they have <fenv.h> and + feenableexcept(). This commit adds a configure.ac check to verify + whether FE_DIVBYZERO is defined or not, and if not, disables the + problematic code in test/utils.c. + + Signed-off-by: Thomas Petazzoni <[email protected]> + Signed-off-by: Marek Vasut <[email protected]> + Acked-by: Siarhei Siamashka <[email protected]> + Signed-off-by: Oded Gabbay <[email protected]> + +commit 8189fad9610981d5b4dcd8f8980ff169110fb33c +Author: Oded Gabbay <[email protected]> +Date: Sun Sep 6 11:45:20 2015 +0300 + + vmx: Remove unused expensive functions + + Now that we replaced the expensive functions with better performing + alternatives, we should remove them so they will not be used again. + + Running Cairo benchmark on trimmed traces gave the following results: + + POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le. + + Speedups + ======== + t-firefox-scrolling 1232.30 -> 1096.55 : 1.12x + t-gnome-terminal-vim 613.86 -> 553.10 : 1.11x + t-evolution 405.54 -> 371.02 : 1.09x + t-firefox-talos-gfx 919.31 -> 862.27 : 1.07x + t-gvim 653.02 -> 616.85 : 1.06x + t-firefox-canvas-alpha 941.29 -> 890.42 : 1.06x + + Signed-off-by: Oded Gabbay <[email protected]> + Acked-by: Pekka Paalanen <[email protected]> + Acked-by: Siarhei Siamashka <[email protected]> + +commit 6b1b8b2b90da11bf6101a151786b2a8c9f087338 +Author: Oded Gabbay <[email protected]> +Date: Sun Jun 28 13:17:41 2015 +0300 + + vmx: implement fast path vmx_composite_over_n_8_8888 + + POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le. + + reference memcpy speed = 25008.9MB/s (6252.2MP/s for 32bpp fills) + + Before After Change + --------------------------------------------- + L1 91.32 182.84 +100.22% + L2 94.94 182.83 +92.57% + M 95.55 181.51 +89.96% + HT 88.96 162.09 +82.21% + VT 87.4 168.35 +92.62% + R 83.37 146.23 +75.40% + RT 66.4 91.5 +37.80% + Kops/s 683 859 +25.77% + + Signed-off-by: Oded Gabbay <[email protected]> + Acked-by: Pekka Paalanen <[email protected]> + Acked-by: Siarhei Siamashka <[email protected]> + +commit 8d8caa55a38c00351047d24322e23b201b6b29ff +Author: Oded Gabbay <[email protected]> +Date: Sun Sep 6 11:46:15 2015 +0300 + + vmx: optimize vmx_composite_over_n_8888_8888_ca + + This patch optimizes vmx_composite_over_n_8888_8888_ca by removing use + of expand_alpha_1x128, unpack/pack and in_over_2x128 in favor of + splat_alpha, in_over and MUL/ADD macros from pixman_combine32.h. + + Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores, + 3.4GHz, RHEL 7.2 ppc64le gave the following results: + + reference memcpy speed = 23475.4MB/s (5868.8MP/s for 32bpp fills) + + Before After Change + -------------------------------------------- + L1 244.97 474.05 +93.51% + L2 243.74 473.05 +94.08% + M 243.29 467.16 +92.02% + HT 144.03 252.79 +75.51% + VT 174.24 279.03 +60.14% + R 109.86 149.98 +36.52% + RT 47.96 53.18 +10.88% + Kops/s 524 576 +9.92% + + Signed-off-by: Oded Gabbay <[email protected]> + Acked-by: Pekka Paalanen <[email protected]> + Acked-by: Siarhei Siamashka <[email protected]> + +commit 857880f0e4d1d42a8508ac77be33556cc6f7f546 +Author: Oded Gabbay <[email protected]> +Date: Sun Sep 6 10:58:30 2015 +0300 + + vmx: optimize scaled_nearest_scanline_vmx_8888_8888_OVER + + This patch optimizes scaled_nearest_scanline_vmx_8888_8888_OVER and all + the functions it calls (combine1, combine4 and + core_combine_over_u_pixel_vmx). + + The optimization is done by removing use of expand_alpha_1x128 and + expand_alpha_2x128 in favor of splat_alpha and MUL/ADD macros from + pixman_combine32.h. + + Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores, + 3.4GHz, RHEL 7.2 ppc64le gave the following results: + + reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills) + + Before After Change + -------------------------------------------- + L1 182.05 210.22 +15.47% + L2 180.6 208.92 +15.68% + M 180.52 208.22 +15.34% + HT 130.17 178.97 +37.49% + VT 145.82 184.22 +26.33% + R 104.51 129.38 +23.80% + RT 48.3 61.54 +27.41% + Kops/s 430 504 +17.21% + + v2: Check *pm is not NULL before dereferencing it in combine1() + + Signed-off-by: Oded Gabbay <[email protected]> + Acked-by: Pekka Paalanen <[email protected]> + Acked-by: Siarhei Siamashka <[email protected]> + +commit 73e586efb3ee149f76f15d9e549bffa15d8e30ec +Author: Pekka Paalanen <[email protected]> +Date: Mon Sep 7 14:40:49 2015 +0300 + + armv6: enable over_n_8888 + + Enable the fast path added in the previous patch by moving the lookup + table entries to their proper locations. + + Lowlevel-blt-bench benchmark statistics with 30 iterations, showing the + effect of adding this one patch on top of + "armv6: Add over_n_8888 fast path (disabled)", which was applied on + fd595692941f3d9ddea8934462bd1d18aed07c65. + + Before After + Mean StdDev Mean StdDev Confidence Change + L1 12.5 0.04 45.2 0.10 100.00% +263.1% + L2 11.1 0.02 43.2 0.03 100.00% +289.3% + M 9.4 0.00 42.4 0.02 100.00% +351.7% + HT 8.5 0.02 25.4 0.10 100.00% +198.8% + VT 8.4 0.02 22.3 0.07 100.00% +167.0% + R 8.2 0.02 23.1 0.09 100.00% +183.6% + RT 5.4 0.05 11.4 0.21 100.00% +110.3% + + At most 3 outliers rejected per test per set. + + Iterating here means that lowlevel-blt-bench was executed 30 times, and + the statistics above were computed from the output. + + Signed-off-by: Pekka Paalanen <[email protected]> + +commit 9eb6889b15a180cc94aad8ac97189af5b3a68b96 +Author: Ben Avison <[email protected]> +Date: Mon Sep 7 14:40:48 2015 +0300 + + armv6: Add over_n_8888 fast path (disabled) + + This new fast path is initially disabled by putting the entries in the + lookup table after the sentinel. The compiler cannot tell the new code + is not used, so it cannot eliminate the code. Also the lookup table size + will include the new fast path. When the follow-up patch then enables + the new fast path, the binary layout (alignments, size, etc.) will stay + the same compared to the disabled case. + + Keeping the binary layout identical is important for benchmarking on + Raspberry Pi 1. The addresses at which functions are loaded will have a + significant impact on benchmark results, causing unexpected performance + changes. Keeping all function addresses the same across the patch + enabling a new fast path improves the reliability of benchmarks. + + Benchmark results are included in the patch enabling this fast path. + + [Pekka: disabled the fast path, commit message] + Signed-off-by: Pekka Paalanen <[email protected]> + +commit 4c71f595e3393be5b922df37d50d71dd83f4f979 +Author: Ben Avison <[email protected]> +Date: Wed Sep 2 20:35:59 2015 +0100 + + test: Add cover-test v5 + + This test aims to verify both numerical correctness and the honouring of + array bounds for scaled plots (both nearest-neighbour and bilinear) at or + close to the boundary conditions for applicability of "cover" type fast paths + and iter fetch routines. + + It has a secondary purpose: by setting the env var EXACT (to any value) it + will only test plots that are exactly on the boundary condition. This makes + it possible to ensure that "cover" routines are being used to the maximum, + although this requires the use of a debugger or code instrumentation to + verify. + + Changes in v4: + + Check the fence page size and skip the test if it is too large. Since + we need to deal with pixman_fixed_t coordinates that go beyond the + real image width, make the page size limit 16 kB. A 32 kB or larger + page size would cause an a8 image width to be 32k or more, which is no + longer representable in pixman_fixed_t. + + Use a shorthand variable 'filter' in test_cover(). + + Whitespace adjustments. + + Changes in v5: + + Skip if fenced memory is not supported. Do you know of any such + platform? + + Signed-off-by: Ben Avison <[email protected]> + [Pekka: changes in v4 and v5] + Signed-off-by: Pekka Paalanen <[email protected]> + Reviewed-by: Ben Avison <[email protected]> + Acked-by: Oded Gabbay <[email protected]> + +commit 812c9c9758e1503bd1725af9c6fe9ede6a467506 +Author: Pekka Paalanen <[email protected]> +Date: Tue Sep 8 13:35:33 2015 +0300 + + implementation: add PIXMAN_DISABLE=wholeops + + Add a new option to PIXMAN_DISABLE: "wholeops". This option disables all + whole-operation fast paths regardless of implementation level, except + the general path (general_composite_rect). + + The purpose is to add a debug option that allows us to test optimized + iterator paths specifically. With this, it is possible to see if: + - fast paths mask bugs in iterators + - compare fast paths with iterator paths for performance + + The effect was tested on x86_64 by running: + $ PIXMAN_DISABLE='' ./test/lowlevel-blt-bench over_8888_8888 + $ PIXMAN_DISABLE='wholeops' ./test/lowlevel-blt-bench over_8888_8888 + + In the first case time is spent in sse2_composite_over_8888_8888(), and + in the latter in sse2_combine_over_u(). + + Signed-off-by: Pekka Paalanen <[email protected]> + Reviewed-by: Oded Gabbay <[email protected]> + +commit e9ef2cc4dea04792a03d604c075c344055765217 +Author: Pekka Paalanen <[email protected]> +Date: Tue Sep 8 09:36:48 2015 +0300 + + utils.[ch]: add fence_get_page_size() + + Add a function to get the page size used for memory fence purposes, and + use it everywhere where getpagesize() was used. + + This offers a single point in code to override the page size, in case + one wants to experiment how the tests work with a higher page size than + what the developer's machine has. + + This also offers a clean API, without adding #ifdefs, to tests for + checking the page size. + + Signed-off-by: Pekka Paalanen <[email protected]> + Reviewed-by: Oded Gabbay <[email protected]> + Reviewed-by: Ben Avison <[email protected]> + +commit 82f8c997dfd3f60a48134107ecf38663b464bdc9 +Author: Pekka Paalanen <[email protected]> +Date: Tue Sep 8 09:20:46 2015 +0300 + + utils.c: fix fallback code for fence_image_create_bits() + + Used a wrong variable name, causing: + /home/pq/git/pixman/demos/../test/utils.c: In function ‘fence_image_create_bits’: + /home/pq/git/pixman/demos/../test/utils.c:562:46: error: ‘width’ undeclared (first use in this function) + + Use the correct variable. + + Signed-off-by: Pekka Paalanen <[email protected]> + Reviewed-by: Oded Gabbay <[email protected]> + Reviewed-by: Ben Avison <[email protected]> + +commit 07006853828a59b5e0cd7d7d058d03db4e23e6ec +Author: Pekka Paalanen <[email protected]> +Date: Thu May 7 17:16:05 2015 +0300 + + test: add fence-image-self-test + + Tests that fence_malloc and fence_image_create_bits actually work: that + out-of-bounds and out-of-row (unused stride area) accesses trigger + SIGSEGV. + + If fence_malloc is a dummy (FENCE_MALLOC_ACTIVE not defined), this test + is skipped. + + Changes in v2: + + - check FENCE_MALLOC_ACTIVE value, not whether it is defined + - test that reading bytes near the fence pages does not cause a + segmentation fault + + Changes in v3: + + - Do not print progress messages unless VERBOSE environment variable is + set. Avoid spamming the terminal output of 'make check' on some + versions of autotools. + + Signed-off-by: Pekka Paalanen <[email protected]> + Reviewed-by: Ben Avison <[email protected]> + +commit 13d93aa12050ce99643d56b0c730404294f46c2f +Author: Pekka Paalanen <[email protected]> +Date: Thu May 7 16:46:01 2015 +0300 + + utils.[ch]: add fence_image_create_bits () + + Useful for detecting out-of-bounds accesses in composite operations. + + This will be used by follow-up patches adding new tests. + + Changes in v2: + + - fix style on fence_image_create_bits args + - add page to stride only if stride_fence + - add comment on the fallback definition about freeing storage + + Signed-off-by: Pekka Paalanen <[email protected]> + Reviewed-by: Ben Avison <[email protected]> + +commit c70ddd5c9e12d87ff461d73a6f53b00d52925cf5 +Author: Pekka Paalanen <[email protected]> +Date: Thu May 7 14:21:30 2015 +0300 + + utils.[ch]: add FENCE_MALLOC_ACTIVE + + Define a new token to simplify checking whether fence_malloc() actually + can catch out-of-bounds access. + + This will be used in the future to skip tests that rely on fence_malloc + checking functionality. + + Changes in v2: + + - #define FENCE_MALLOC_ACTIVE always, but change its value to help catch + use of it without including utils.h + + Signed-off-by: Pekka Paalanen <[email protected]> + Reviewed-by: Ben Avison <[email protected]> + +commit a82e519944e5d1af41cc94a14d9ae1fe0e430e68 +Author: Ben Avison <[email protected]> +Date: Thu Aug 20 13:07:48 2015 +0100 + + scaling-test: list more details when verbose + + Add mask details to the output. + + [Pekka: redo whitespace and print src,dst,mask x and y.] + Signed-off-by: Pekka Paalanen <[email protected]> + Reviewed-by: Ben Avison <[email protected]> + +commit fd595692941f3d9ddea8934462bd1d18aed07c65 +Author: Pekka Paalanen <[email protected]> +Date: Tue Jul 7 11:31:20 2015 +0300 + + lowlevel-blt-bench: make extra arguments an error + + If a user gives multiple patterns or extra arguments, only the last one + was used as the pattern while the former were just ignored. This is a + user error silently converted to something possibly unexpected. + + In presence of extra arguments, complain and quit. + + Cc: Ben Avison <[email protected]> + Signed-off-by: Pekka Paalanen <[email protected]> + +commit 69611473c5a4e7cc2e6016d82ff4ed28e289484a +Author: Oded Gabbay <[email protected]> +Date: Sat Aug 1 23:01:43 2015 +0300 + + Post-release version bump to 0.33.3 + + Signed-off-by: Oded Gabbay <[email protected]> + commit ee790044b08e3b668e6aa5d9229f46ed7295ebf0 Author: Oded Gabbay <[email protected]> Date: Sat Aug 1 22:34:53 2015 +0300 diff --git a/debian/changelog b/debian/changelog index 42e6d85..be437ce 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,3 +1,9 @@ +pixman (0.33.4-1) UNRELEASED; urgency=medium + + * New upstream release candidate. + + -- Andreas Boll <[email protected]> Wed, 04 Nov 2015 10:30:37 +0100 + pixman (0.33.2-2) sid; urgency=medium * Run tests with VERBOSE=1. commit fa71d08a81c9bf3f2366ee45474ff868d9e10b8e Author: Oded Gabbay <[email protected]> Date: Fri Oct 23 17:58:49 2015 +0300 Pre-release version bump to 0.33.4 Signed-off-by: Oded Gabbay <[email protected]> diff --git a/configure.ac b/configure.ac index b04cc69..dcacff1 100644 --- a/configure.ac +++ b/configure.ac @@ -54,7 +54,7 @@ AC_PREREQ([2.57]) m4_define([pixman_major], 0) m4_define([pixman_minor], 33) -m4_define([pixman_micro], 3) +m4_define([pixman_micro], 4) m4_define([pixman_version],[pixman_major.pixman_minor.pixman_micro]) commit 9728241bd098bc4260e6cd83997dfecc64adc356 Author: Andrea Canciani <[email protected]> Date: Tue Oct 13 13:35:59 2015 +0200 test: Fix fence-image-self-test on Mac On MacOS X, according to the manpage of mprotect(), "When a program violates the protections of a page, it gets a SIGBUS or SIGSEGV signal.", but fence-image-self-test was only accepting a SIGSEGV as notification of invalid access. Fixes fence-image-self-test Reviewed-by: Pekka Paalanen <[email protected]> diff --git a/test/fence-image-self-test.c b/test/fence-image-self-test.c index c883038..c80b3cf 100644 --- a/test/fence-image-self-test.c +++ b/test/fence-image-self-test.c @@ -73,7 +73,7 @@ prinfo (const char *fmt, ...) } static void -do_expect_segv (void (*fn)(void *), void *data) +do_expect_signal (void (*fn)(void *), void *data) { struct sigaction sa; @@ -82,6 +82,8 @@ do_expect_segv (void (*fn)(void *), void *data) sa.sa_sigaction = segv_handler; if (sigaction (SIGSEGV, &sa, NULL) == -1) die ("sigaction failed", errno); + if (sigaction (SIGBUS, &sa, NULL) == -1) + die ("sigaction failed", errno); (*fn)(data); @@ -96,7 +98,7 @@ do_expect_segv (void (*fn)(void *), void *data) * to exit with success, and return failure otherwise. */ static pixman_bool_t -expect_segv (void (*fn)(void *), void *data) +expect_signal (void (*fn)(void *), void *data) { pid_t pid, wp; int status; @@ -106,7 +108,7 @@ expect_segv (void (*fn)(void *), void *data) die ("fork failed", errno); if (pid == 0) - do_expect_segv (fn, data); /* never returns */ + do_expect_signal (fn, data); /* never returns */ wp = waitpid (pid, &status, 0); if (wp != pid) @@ -131,9 +133,9 @@ test_read_fault (uint8_t *p, int offset) { prinfo ("*(uint8_t *)(%p + %d)", p, offset); - if (expect_segv (read_u8, p + offset)) + if (expect_signal (read_u8, p + offset)) { - prinfo ("\tSEGV OK\n"); + prinfo ("\tsignal OK\n"); return TRUE; } diff --git a/test/utils.c b/test/utils.c index 8657966..f8e42a5 100644 --- a/test/utils.c +++ b/test/utils.c @@ -471,9 +471,9 @@ fence_image_destroy (pixman_image_t *image, void *data) * min_width is only a minimum width for the image. The width is aligned up * for the row size to be divisible by both page size and pixel size. * - * If stride_fence is true, the additional page on each row will be armed - * to cause SIGSEVG on all accesses. This should catch all accesses outside - * the valid row pixels. + * If stride_fence is true, the additional page on each row will be + * armed to cause SIGSEGV or SIGBUS on all accesses. This should catch + * all accesses outside the valid row pixels. */ pixman_image_t * fence_image_create_bits (pixman_format_code_t format, commit 7de61d8d14e84623b6fa46506eb74f938287f536 Author: Matt Turner <[email protected]> Date: Sun Oct 11 14:44:46 2015 -0700 mmx: Use MMX2 intrinsics from xmmintrin.h directly. We had lots of hacks to handle the inability to include xmmintrin.h without compiling with -msse (lest SSE instructions be used in pixman-mmx.c). Some recent version of gcc relaxed this restriction. Change configure.ac to test that xmmintrin.h can be included and that we can use some intrinsics from it, and remove the work-around code from pixman-mmx.c. Evidently allows gcc 4.9.3 to optimize better as well: text data bss dec hex filename 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after Reviewed-by: Siarhei Siamashka <[email protected]> Tested-by: Pekka Paalanen <[email protected]> Signed-off-by: Matt Turner <[email protected]> diff --git a/configure.ac b/configure.ac index 424bfd3..b04cc69 100644 --- a/configure.ac +++ b/configure.ac @@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ #error "Need GCC >= 3.4 for MMX intrinsics" #endif #include <mmintrin.h> +#include <xmmintrin.h> int main () { __m64 v = _mm_cvtsi32_si64 (1); __m64 w; - /* Some versions of clang will choke on K */ - asm ("pshufw %2, %1, %0\n\t" - : "=y" (w) - : "y" (v), "K" (5) - ); - - /* Some versions of clang will choke on this */ - asm ("pmulhuw %1, %0\n\t" - : "+y" (w) - : "y" (v) - ); + /* Test some intrinsics from xmmintrin.h */ + w = _mm_shuffle_pi16(v, 5); + w = _mm_mulhi_pu16(w, w); return _mm_cvtsi64_si32 (v); }]])], have_mmx_intrinsics=yes) diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c index 05c48a4..88c3a39 100644 --- a/pixman/pixman-mmx.c +++ b/pixman/pixman-mmx.c @@ -40,6 +40,9 @@ #else #include <mmintrin.h> #endif +#ifdef USE_X86_MMX +#include <xmmintrin.h> +#endif #include "pixman-private.h" #include "pixman-combine32.h" #include "pixman-inlines.h" @@ -59,66 +62,7 @@ _mm_empty (void) } #endif -#ifdef USE_X86_MMX -# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64)) -# include <xmmintrin.h> -# else -/* We have to compile with -msse to use xmmintrin.h, but that causes SSE - * instructions to be generated that we don't want. Just duplicate the - * functions we want to use. */ -extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_movemask_pi8 (__m64 __A) -{ - int ret; - - asm ("pmovmskb %1, %0\n\t" - : "=r" (ret) - : "y" (__A) - ); - - return ret; -} - -extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mulhi_pu16 (__m64 __A, __m64 __B) -{ - asm ("pmulhuw %1, %0\n\t" - : "+y" (__A) - : "y" (__B) - ); - return __A; -} - -# ifdef __OPTIMIZE__ -extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_shuffle_pi16 (__m64 __A, int8_t const __N) -{ - __m64 ret; - - asm ("pshufw %2, %1, %0\n\t" - : "=y" (ret) - : "y" (__A), "K" (__N) - ); - - return ret; -} -# else -# define _mm_shuffle_pi16(A, N) \ - ({ \ - __m64 ret; \ - \ - asm ("pshufw %2, %1, %0\n\t" \ - : "=y" (ret) \ - : "y" (A), "K" ((const int8_t)N) \ - ); \ - \ - ret; \ - }) -# endif -# endif -#endif - -#ifndef _MSC_VER +#ifndef _MM_SHUFFLE #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \ (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0)) #endif commit 90e62c086766afffd289a321c7de8ea4b5cac87d Author: Siarhei Siamashka <[email protected]> Date: Fri Sep 4 15:39:00 2015 +0300 vmx: implement fast path vmx_composite_over_n_8888 Running "lowlevel-blt-bench over_n_8888" on Playstation3 3.2GHz, Gentoo ppc (32-bit userland) gave the following results: before: over_n_8888 = L1: 147.47 L2: 205.86 M:121.07 after: over_n_8888 = L1: 287.27 L2: 261.09 M:133.48 Cairo non-trimmed benchmarks on POWER8, 3.4GHz 8 Cores: ocitysmap 659.69 -> 611.71 : 1.08x speedup xfce4-terminal-a1 2725.22 -> 2547.47 : 1.07x speedup

