Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Thu, 23 Apr 2015 17:12:59 +0100, I wrote: I imagine most of the time, you'll have a source image of fixed size, and you'll either have a target destination size (in which case your task is to calculate the transform matrix coefficients) or you'll have a target scaling factor (in which case your task is to work out the destination size that can be achieved without running off the sides of the source image). Just to add to this, an example has come to mind that demonstrates why higher level software should really know about the pixman_fixed_e offset for nearest-neighbour scaling in order to make best use of Pixman. Assume we have a source image that's 101 pixels wide, and we know it has to be plotted at a scale factor of 50%. How many destination pixels do we tell Pixman to plot? In reality, given a transform matrix whose diagonal is (0.5, 0.5, 1), Pixman is capable out picking out source pixels with index 0, 2, 4, 6 ... 100 - that's 51 pixels - whilst still using a COVER_CLIP fast path. If it weren't for the pixman_fixed_e offset, it would pick out source pixels 1, 3, 5, 7 ... 99 instead - that's 50 pixels. Now, if the caller asks for 50 pixels and Pixman uses the offset, then we get to use the COVER_CLIP fast path, but we've trimmed two pixels off the right and none off the left. But if the caller asks for 51 pixels and Pixman doesn't use the offset, then Pixman realises it needs a value for out-of-bounds pixel index 101 so won't use the COVER_CLIP fast path. So, for the best combination of image quality and speed, the details of Pixman's rounding algorithm *can* be significant. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Wed, 08 Apr 2015 02:27:48 +0100 Ben Avison bavi...@riscosopen.org wrote: On Tue, 17 Mar 2015 11:12:53 -, Pekka Paalanen ppaala...@gmail.com wrote: If there is no transform, why not return the original extents? These have been reduced by a half unit in all four directions. It's more obviously relevant in the bilinear scaling case, but a pixel's colour is considered to apply to the midpoint of each pixel. Thus an image of width N is specifying colours at ordinates 0.5, 1.5, 2.5 ... (N-1.5), (N-0.5). Therefore, when you're working out which source pixels are required to generate a range of output pixels, it's 0.5 and (N-0.5) that you need to feed through any transformation matrix. If the no transform case were special-cased, you'd then need to special-case it again in the calling function because the 0.5 offset applies (albeit at different scaling factors) to both source and destination images, so the caller will be adding the 0.5 pixel border back on again. Ok, so the extents going in to compute_transformed_extents() have a different meaning than the ones coming out. The ones going in define the edges of the image, while the ones coming out are reduced to the corner pixel centers. Which means that it is incorrect to, say, compute extents for two sequential transforms one by one but you must combine the transforms first and then call compute_transformed_extents() just once. I was just surprised by this convention. In affine-bench, I calculate image sizes based on bilinear scaling for simplicity, since a source or mask image big enough to satisfy COVER_CLIP for bilinear scaling will also do so for nearest scaling. [compute_transformed_extents] +tx1 = ty1 = INT64_MAX; +tx2 = ty2 = INT64_MIN; + +for (i = 0; i 4; ++i) +{ +pixman_fixed_48_16_t tx, ty; +pixman_vector_t v; + +v.vector[0] = (i 0x01)? x1 : x2; +v.vector[1] = (i 0x02)? y1 : y2; +v.vector[2] = pixman_fixed_1; + +if (!pixman_transform_point (transform, v)) +return FALSE; + +tx = (pixman_fixed_48_16_t)v.vector[0]; +ty = (pixman_fixed_48_16_t)v.vector[1]; This works only for affine. This is called affine-bench, so that is natural, yet might be nice to add an assert or something to guarantee no-one will copy this code to something else that uses projective transforms. I might be missing something, but it looks to me like pixman_transform_point (through its subroutine pixman_transform_point_31_16) does handle projective transforms, and what this code segment is doing is iterating over the four corners of the destination rectangle and finding the maximum and minimum source x/y by considering the transformed position of each corner independently. Surely the whole set of destination pixels, once passed through even a projective matrix, must still lie within this bounding box? Ahh, another case where I had wrong assumptions. pixman_transform_point() indeed does guarantee, that v.vector[2] is 1.0. I'm used to assume it's not guaranteed because the guarantee leads to a division by zero for points at infinity. But Pixman cannot deal with points at infinity anyway, so it sort of makes sense in relation to pixel images. [flush_cache] +volatile const char *x = clean_space; [...] +(void) *x; Are you sure this actually does the read? That the compiler is not allowed to discard the read? That's something I'm never sure of. That's the point of the volatile qualifier - the compiler isn't allowed to throw away accesses to an object which is declared volatile, it has to perform exactly as many accesses as you ask for and in the same order. From the C standard: A volatile declaration may be used to describe an object corresponding to a memory-mapped input/output port or an object accessed by an asynchronously interrupting function. Actions on objects so declared shall not be 'optimized out' by an implementation or reordered except as permitted by the rules for evaluating expressions. What you do need to be careful of these days is that this says nothing about the number of or ordering of accesses as viewed by any part of the system beyond the L1 cache, including any other cores - that's where memory barriers come in. But that's not relevant for the purposes of cleaning the cache. Very good. +x += 32; Why 32? Is this CACHELINE_LENGTH from lowlevel-blt-bench.c? Yes, that's the origin. You might argue for reading the cacheline length at runtime, as with cache and page size, but it's less critical here. If your cacheline length was actually 64 bytes, then reading every 32nd byte would still pull in every cacheline. Reading every single byte would have the same effect but be a bit of a waste of time; the increment only needs to be as short as the shortest cacheline you're likely to run with. I guess I
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Tue, 14 Apr 2015 11:28:42 +0100, Pekka Paalanen ppaala...@gmail.com wrote: So the matrices are indexed as matrix[row][column] in Pixman? That's my understanding, yes. A short'ish comment somewhere to say why you are doing this offsetting would be nice, and that the offsetting is the reason to allocate a margin. Since you've been reworking the patches anyway, do you have enough information to add the comments yourself or do you want me to do them? One simple way round this is to apply a 0.5 pixel translation in the transformation matrix, so that the pattern becomes: d[0] =1.00 * s[0] d[1] =0.50 * s[0] + 0.50 * s[1] d[2] = 1.00 * s[1] d[3] = 0.50 * s[1] + 0.50 * s[2] but this is thwarted by the 8/65536ths of a pixel fudge factor. I can't see why the fudge factor is needed at all, at least not in the affine case; it's described as being to compensate for rounding errors, but there is no rounding involved in fixed-point affine transforms. Maybe the rounding refers to the pixman_fixed_to_int() calls? I could imagine it is to guarantee that an xmin=0.5 gets really rounded to 0, and xmax=0.5 gets rounded to 1. There's a slightly better worked example in the later patch I was thinking of: http://lists.freedesktop.org/archives/pixman/2014-September/003380.html As I say, we're getting a bit ahead of ourselves here, as there were about 30 patches between the ones we're reworking at the moment and that one. Søren did give some comments on it at the end of his review here: http://lists.freedesktop.org/archives/pixman/2014-October/003457.html which says the 8*pixman_fixed_e was about ensuring we didn't stray beyond the bitmap bounds if a fast path happened to over-read pixels and then multiply them by a 0 filter coefficient. I think we can probably cater for that better by checking whether a bitmap starts or ends at a page boundary, and whether we're reading the first and last pixels of the image if it does. To be honest, both cases you describe sound strange to me. Surely if I use an affine transform matrix { 0.5 0 0; 0 1 0 }, I'd expect d[0] = 0.5 * s[0] + 0.5 * s[1] when assuming the box filter (if I got my terminology right)... You're forgetting it's pixel-centres that are fed through the transform. To be honest, I think it's already quite a headache for an application to work out how to set up the transform in order to hit a cover scaled fast path. Assume the reasonable case that you want to plot the whole of an image of size x,y at a size m,n. You need to set the diagonals of the transform to floor(pixman_fixed_1*(x-1)/(m-1)) floor(pixman_fixed_1*(y-1)/(n-1)) 1 and then solve for / 0.5 \ / 0.5 \ T . | 0.5 | = | 0.5 | \ 1 / \ 1 / to find the translation coefficients that will ensure the top-left source pixel aligns exactly to the top-left destination pixel. To then require that the caller also knows that you need to nudge things along by 8*pixman_fixed_e as well feels like it's requiring too much knowledge of Pixman's internals to me. I didn't really like putting it in affine-bench to begin with for this reason, but it doesn't work without it. So I made do with removing it as quickly as I could - the original intention was for it to be in the same patch series, but obviously the patch series has grown too large and unwieldy for that the be the case any more! When I've seen timing breakdowns of which Pixman operations are being hit by applications, I've seen far more scaled plots with repeat type PAD than I'd really expect. My suspicions are that the bulk of these are down to how difficult it is for applications to set up the transform for maximum efficiency. I think my hope was that since we're processing images of size 1920 * 1080 pixels, you wouldn't have any particular pixels persisting in the cache from the previous iteration (unless perhaps if you're testing a high-factor enlargement on a platform that has caches that don't allocate on write). Right, so is there any need to flush_cache() at all? Possibly not. I was taking my cue from lowlevel-blt-bench again, where it flushes the cache between each type of test. And looking at the implementation of flush_cache() again now, I note that due to the action of allocate-on-write and physically mapped caches, it won't completely flush most caches, only one page-size worth! Perhaps just delete it then. +bench (op, t, src_image, mask_image, dest_image, src_x, src_y, UINT32_MAX, 500, n, t1, pixman_image_composite_wrapper); +bench (op, t, src_image, mask_image, dest_image, src_x, src_y, n, UINT32_MAX, NULL, t2, pixman_image_composite_empty); [...] I think that part I understood. My comment was about having a {} block without any flow control statements for it. You are just using it to scope some variables, which suggests that the block should actually be a separate function. Oh right. Well, I think
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Tue, 17 Mar 2015 11:12:53 -, Pekka Paalanen ppaala...@gmail.com wrote: affine-bench should be added to .gitignore too. OK. +#define L2CACHE_SIZE (128 * 1024) I see lowlevel-blt-bench.c also uses L2CACHE_SIZE, but where does it come from? If it's really an arch/platform-independent constant, maybe some test header could have it? As you will no doubt have noticed, affine-bench is heavily inspired by lowlevel-blt-bench. Admittedly it's a bit of a cheat to assume one L2 cache size, but in lowlevel-blt-bench, it's only used for sizing the images used for the L2 test. For those purposes, it's only important that it's a number big enough that the images are flushed out of the L1 cache but small enough that they aren't flushed out to memory, so as long as it's bigger than the largest L1 cache you're likely to encounter and no smaller than the smallest L2 cache, then it'll do, and saves lots of messy OS-specific code to determine the cache sizes. Admittedly, now I look back on the use of the same constant in affine-bench, I've taken the point of view that it's too difficult in the presence of significant transformation coefficients to calculate image sizes that sit within a particular cache, so I'm only bothering to test the memory-constrained case. To do that, I'm using L2CACHE_SIZE to size an array used to flush the L2 cache - but that needs the *largest* typical L2 cache size, not the smallest, so 128K won't cut it here. While I have easy access to the cache sizes on CPUs that I'm actively targeting, a general-purpose benchmarker should probably be suitable for all architectures. Should we read it from the OS somehow? (Anyone know how best to do that for, say, Linux, Windows, OS X and iOS, in that case?) Looks like Pixman style is using separate statements for struct definition and typedefs. Not exclusively - I think the bits I've looked at usually do, but now that I count them, I grant you the most common form is separate typedefs. This particular example was cut-and-pasted from pixman.c but can easily be changed. +static pixman_bool_t +compute_transformed_extents (pixman_transform_t *transform, + const pixman_box32_t *extents, + box_48_16_t *transformed) CODING_STYLE is asking for alignment here for the arg names. This was also a cut-and-paste, but good point. If there is no transform, why not return the original extents? These have been reduced by a half unit in all four directions. It's more obviously relevant in the bilinear scaling case, but a pixel's colour is considered to apply to the midpoint of each pixel. Thus an image of width N is specifying colours at ordinates 0.5, 1.5, 2.5 ... (N-1.5), (N-0.5). Therefore, when you're working out which source pixels are required to generate a range of output pixels, it's 0.5 and (N-0.5) that you need to feed through any transformation matrix. If the no transform case were special-cased, you'd then need to special-case it again in the calling function because the 0.5 offset applies (albeit at different scaling factors) to both source and destination images, so the caller will be adding the 0.5 pixel border back on again. In affine-bench, I calculate image sizes based on bilinear scaling for simplicity, since a source or mask image big enough to satisfy COVER_CLIP for bilinear scaling will also do so for nearest scaling. [compute_transformed_extents] +tx1 = ty1 = INT64_MAX; +tx2 = ty2 = INT64_MIN; + +for (i = 0; i 4; ++i) +{ +pixman_fixed_48_16_t tx, ty; +pixman_vector_t v; + +v.vector[0] = (i 0x01)? x1 : x2; +v.vector[1] = (i 0x02)? y1 : y2; +v.vector[2] = pixman_fixed_1; + +if (!pixman_transform_point (transform, v)) +return FALSE; + +tx = (pixman_fixed_48_16_t)v.vector[0]; +ty = (pixman_fixed_48_16_t)v.vector[1]; This works only for affine. This is called affine-bench, so that is natural, yet might be nice to add an assert or something to guarantee no-one will copy this code to something else that uses projective transforms. I might be missing something, but it looks to me like pixman_transform_point (through its subroutine pixman_transform_point_31_16) does handle projective transforms, and what this code segment is doing is iterating over the four corners of the destination rectangle and finding the maximum and minimum source x/y by considering the transformed position of each corner independently. Surely the whole set of destination pixels, once passed through even a projective matrix, must still lie within this bounding box? [create_image] Too long line, needs breaking up. OK. [various places] Add empty line. OK. +*bits = aligned_malloc (4096, stride * height); Is there a #define to get the 4096 from? I assume it's page size? What if the platform has big pages? I assume the same thing. It's quite a common page size as I understand it. I know it's not
Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations
On Tue, 3 Mar 2015 15:24:20 + Ben Avison bavi...@riscosopen.org wrote: --- test/Makefile.sources |1 + test/affine-bench.c | 330 + 2 files changed, 331 insertions(+), 0 deletions(-) create mode 100644 test/affine-bench.c diff --git a/test/Makefile.sources b/test/Makefile.sources index c20c34b..8b0e855 100644 --- a/test/Makefile.sources +++ b/test/Makefile.sources @@ -37,6 +37,7 @@ OTHERPROGRAMS = \ radial-perf-test\ check-formats \ scaling-bench \ + affine-bench\ $(NULL) affine-bench should be added to .gitignore too. # Utility functions diff --git a/test/affine-bench.c b/test/affine-bench.c new file mode 100644 index 000..55478d6 --- /dev/null +++ b/test/affine-bench.c @@ -0,0 +1,330 @@ +/* + * Copyright © 2014 RISC OS Open Ltd + * + * Permission to use, copy, modify, distribute, and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that + * copyright notice and this permission notice appear in supporting + * documentation, and that the name of the copyright holders not be used in + * advertising or publicity pertaining to distribution of the software without + * specific, written prior permission. The copyright holders make no + * representations about the suitability of this software for any purpose. It + * is provided as is without express or implied warranty. + * + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS + * SOFTWARE. + * + * Author: Ben Avison (bavi...@riscosopen.org) + */ + +#include stdio.h +#include stdlib.h +#include string.h +#include ctype.h +#include stdint.h +#include utils.h + +#ifdef HAVE_GETTIMEOFDAY +#include sys/time.h +#else +#include time.h +#endif + +#define WIDTH 1920 +#define HEIGHT 1080 +#define L2CACHE_SIZE (128 * 1024) Hi, I see lowlevel-blt-bench.c also uses L2CACHE_SIZE, but where does it come from? If it's really an arch/platform-independent constant, maybe some test header could have it? + +typedef struct +{ +pixman_fixed_48_16_tx1; +pixman_fixed_48_16_ty1; +pixman_fixed_48_16_tx2; +pixman_fixed_48_16_ty2; +} box_48_16_t; Looks like Pixman style is using separate statements for struct definition and typedefs. + +static pixman_bool_t +compute_transformed_extents (pixman_transform_t *transform, + const pixman_box32_t *extents, + box_48_16_t *transformed) CODING_STYLE is asking for alignment here for the arg names. +{ +pixman_fixed_48_16_t tx1, ty1, tx2, ty2; +pixman_fixed_t x1, y1, x2, y2; +int i; + +x1 = pixman_int_to_fixed (extents-x1) + pixman_fixed_1 / 2; +y1 = pixman_int_to_fixed (extents-y1) + pixman_fixed_1 / 2; +x2 = pixman_int_to_fixed (extents-x2) - pixman_fixed_1 / 2; +y2 = pixman_int_to_fixed (extents-y2) - pixman_fixed_1 / 2; + +if (!transform) +{ +transformed-x1 = x1; +transformed-y1 = y1; +transformed-x2 = x2; +transformed-y2 = y2; If there is no transform, why not return the original extents? These have been reduced by a half unit in all four directions. + +return TRUE; +} + +tx1 = ty1 = INT64_MAX; +tx2 = ty2 = INT64_MIN; + +for (i = 0; i 4; ++i) +{ +pixman_fixed_48_16_t tx, ty; +pixman_vector_t v; + +v.vector[0] = (i 0x01)? x1 : x2; +v.vector[1] = (i 0x02)? y1 : y2; +v.vector[2] = pixman_fixed_1; + +if (!pixman_transform_point (transform, v)) +return FALSE; + +tx = (pixman_fixed_48_16_t)v.vector[0]; +ty = (pixman_fixed_48_16_t)v.vector[1]; This works only for affine. This is called affine-bench, so that is natural, yet might be nice to add an assert or something to guarantee no-one will copy this code to something else that uses projective transforms. Or, maybe having affine as part of the function name? + +if (tx tx1) +tx1 = tx; +if (ty ty1) +ty1 = ty; +if (tx tx2) +tx2 = tx; +if (ty ty2) +ty2 = ty; +} + +transformed-x1 = tx1; +transformed-y1 = ty1; +transformed-x2 = tx2; +transformed-y2 = ty2; + +return TRUE; +} +