Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-23 Thread Ben Avison

On Thu, 23 Apr 2015 17:12:59 +0100, I wrote:

I imagine most of the time, you'll have a source image of fixed size, and
you'll either have a target destination size (in which case your task is
to calculate the transform matrix coefficients) or you'll have a target
scaling factor (in which case your task is to work out the destination
size that can be achieved without running off the sides of the source
image).


Just to add to this, an example has come to mind that demonstrates why
higher level software should really know about the pixman_fixed_e offset
for nearest-neighbour scaling in order to make best use of Pixman.

Assume we have a source image that's 101 pixels wide, and we know it has
to be plotted at a scale factor of 50%. How many destination pixels do we
tell Pixman to plot?

In reality, given a transform matrix whose diagonal is (0.5, 0.5, 1),
Pixman is capable out picking out source pixels with index 0, 2, 4, 6 ...
100 - that's 51 pixels - whilst still using a COVER_CLIP fast path. If it
weren't for the pixman_fixed_e offset, it would pick out source pixels 1,
3, 5, 7 ... 99 instead - that's 50 pixels.

Now, if the caller asks for 50 pixels and Pixman uses the offset, then we
get to use the COVER_CLIP fast path, but we've trimmed two pixels off the
right and none off the left.

But if the caller asks for 51 pixels and Pixman doesn't use the offset,
then Pixman realises it needs a value for out-of-bounds pixel index 101
so won't use the COVER_CLIP fast path.

So, for the best combination of image quality and speed, the details of
Pixman's rounding algorithm *can* be significant.

Ben
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-14 Thread Pekka Paalanen
On Wed, 08 Apr 2015 02:27:48 +0100
Ben Avison bavi...@riscosopen.org wrote:

 On Tue, 17 Mar 2015 11:12:53 -, Pekka Paalanen ppaala...@gmail.com 
 wrote:

  If there is no transform, why not return the original extents?
  These have been reduced by a half unit in all four directions.
 
 It's more obviously relevant in the bilinear scaling case, but a pixel's
 colour is considered to apply to the midpoint of each pixel. Thus an
 image of width N is specifying colours at ordinates 0.5, 1.5, 2.5 ...
 (N-1.5), (N-0.5). Therefore, when you're working out which source pixels
 are required to generate a range of output pixels, it's 0.5 and (N-0.5)
 that you need to feed through any transformation matrix. If the no
 transform case were special-cased, you'd then need to special-case it
 again in the calling function because the 0.5 offset applies (albeit at
 different scaling factors) to both source and destination images, so the
 caller will be adding the 0.5 pixel border back on again.

Ok, so the extents going in to compute_transformed_extents() have a
different meaning than the ones coming out. The ones going in define
the edges of the image, while the ones coming out are reduced to the
corner pixel centers.

Which means that it is incorrect to, say, compute extents for two
sequential transforms one by one but you must combine the transforms
first and then call compute_transformed_extents() just once.

I was just surprised by this convention.

 In affine-bench, I calculate image sizes based on bilinear scaling for
 simplicity, since a source or mask image big enough to satisfy COVER_CLIP
 for bilinear scaling will also do so for nearest scaling.
 
 [compute_transformed_extents]
  +tx1 = ty1 = INT64_MAX;
  +tx2 = ty2 = INT64_MIN;
  +
  +for (i = 0; i  4; ++i)
  +{
  +pixman_fixed_48_16_t tx, ty;
  +pixman_vector_t v;
  +
  +v.vector[0] = (i  0x01)? x1 : x2;
  +v.vector[1] = (i  0x02)? y1 : y2;
  +v.vector[2] = pixman_fixed_1;
  +
  +if (!pixman_transform_point (transform, v))
  +return FALSE;
  +
  +tx = (pixman_fixed_48_16_t)v.vector[0];
  +ty = (pixman_fixed_48_16_t)v.vector[1];
 
  This works only for affine. This is called affine-bench, so that is
  natural, yet might be nice to add an assert or something to guarantee
  no-one will copy this code to something else that uses projective
  transforms.
 
 I might be missing something, but it looks to me like
 pixman_transform_point (through its subroutine
 pixman_transform_point_31_16) does handle projective transforms, and what
 this code segment is doing is iterating over the four corners of the
 destination rectangle and finding the maximum and minimum source x/y by
 considering the transformed position of each corner independently. Surely
 the whole set of destination pixels, once passed through even a
 projective matrix, must still lie within this bounding box?

Ahh, another case where I had wrong assumptions.
pixman_transform_point() indeed does guarantee, that v.vector[2] is
1.0. I'm used to assume it's not guaranteed because the guarantee
leads to a division by zero for points at infinity. But Pixman cannot
deal with points at infinity anyway, so it sort of makes sense in
relation to pixel images.

 [flush_cache]
  +volatile const char *x = clean_space;
 [...]
  +(void) *x;
 
  Are you sure this actually does the read? That the compiler is not
  allowed to discard the read? That's something I'm never sure of.
 
 That's the point of the volatile qualifier - the compiler isn't allowed
 to throw away accesses to an object which is declared volatile, it has to
 perform exactly as many accesses as you ask for and in the same order.
  From the C standard:
 
 A volatile declaration may be used to describe an object
 corresponding to a memory-mapped input/output port or an object
 accessed by an asynchronously interrupting function. Actions on
 objects so declared shall not be 'optimized out' by an
 implementation or reordered except as permitted by the rules for
 evaluating expressions.
 
 What you do need to be careful of these days is that this says nothing
 about the number of or ordering of accesses as viewed by any part of the
 system beyond the L1 cache, including any other cores - that's where
 memory barriers come in. But that's not relevant for the purposes of
 cleaning the cache.

Very good.

  +x += 32;
 
  Why 32? Is this CACHELINE_LENGTH from lowlevel-blt-bench.c?
 
 Yes, that's the origin. You might argue for reading the cacheline length
 at runtime, as with cache and page size, but it's less critical here. If
 your cacheline length was actually 64 bytes, then reading every 32nd byte
 would still pull in every cacheline. Reading every single byte would have
 the same effect but be a bit of a waste of time; the increment only needs
 to be as short as the shortest cacheline you're likely to run with. I
 guess I 

Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-14 Thread Ben Avison

On Tue, 14 Apr 2015 11:28:42 +0100, Pekka Paalanen ppaala...@gmail.com wrote:

So the matrices are indexed as matrix[row][column] in Pixman?


That's my understanding, yes.


A short'ish comment somewhere to say why you are doing this offsetting
would be nice, and that the offsetting is the reason to allocate a
margin.


Since you've been reworking the patches anyway, do you have enough
information to add the comments yourself or do you want me to do them?


One simple way round this is to apply a 0.5 pixel translation in the
transformation matrix, so that the pattern becomes:

d[0] =1.00 * s[0]
d[1] =0.50 * s[0] + 0.50 * s[1]
d[2] =  1.00 * s[1]
d[3] =  0.50 * s[1] + 0.50 * s[2]

but this is thwarted by the 8/65536ths of a pixel fudge factor. I can't
see why the fudge factor is needed at all, at least not in the affine
case; it's described as being to compensate for rounding errors, but
there is no rounding involved in fixed-point affine transforms.


Maybe the rounding refers to the pixman_fixed_to_int() calls? I could
imagine it is to guarantee that an xmin=0.5 gets really rounded to 0,
and xmax=0.5 gets rounded to 1.


There's a slightly better worked example in the later patch I was
thinking of:

http://lists.freedesktop.org/archives/pixman/2014-September/003380.html

As I say, we're getting a bit ahead of ourselves here, as there were
about 30 patches between the ones we're reworking at the moment and that
one. Søren did give some comments on it at the end of his review here:

http://lists.freedesktop.org/archives/pixman/2014-October/003457.html

which says the 8*pixman_fixed_e was about ensuring we didn't stray beyond
the bitmap bounds if a fast path happened to over-read pixels and then
multiply them by a 0 filter coefficient. I think we can probably cater
for that better by checking whether a bitmap starts or ends at a page
boundary, and whether we're reading the first and last pixels of the
image if it does.


To be honest, both cases you describe sound strange to me. Surely if I
use an affine transform matrix { 0.5 0 0; 0 1 0 }, I'd expect
d[0] = 0.5 * s[0] + 0.5 * s[1]
when assuming the box filter (if I got my terminology right)...


You're forgetting it's pixel-centres that are fed through the transform.
To be honest, I think it's already quite a headache for an application to
work out how to set up the transform in order to hit a cover scaled
fast path. Assume the reasonable case that you want to plot the whole of
an image of size x,y at a size m,n. You need to set the diagonals of the
transform to

floor(pixman_fixed_1*(x-1)/(m-1))
floor(pixman_fixed_1*(y-1)/(n-1))
1

and then solve for

/ 0.5 \   / 0.5 \
T . | 0.5 | = | 0.5 |
\ 1   /   \ 1   /

to find the translation coefficients that will ensure the top-left source
pixel aligns exactly to the top-left destination pixel.

To then require that the caller also knows that you need to nudge things
along by 8*pixman_fixed_e as well feels like it's requiring too much
knowledge of Pixman's internals to me. I didn't really like putting it in
affine-bench to begin with for this reason, but it doesn't work without
it. So I made do with removing it as quickly as I could - the original
intention was for it to be in the same patch series, but obviously the
patch series has grown too large and unwieldy for that the be the case
any more!

When I've seen timing breakdowns of which Pixman operations are being hit
by applications, I've seen far more scaled plots with repeat type PAD
than I'd really expect. My suspicions are that the bulk of these are down
to how difficult it is for applications to set up the transform for
maximum efficiency.


I think my hope was that since we're processing images of size 1920 *
1080 pixels, you wouldn't have any particular pixels persisting in the
cache from the previous iteration (unless perhaps if you're testing a
high-factor enlargement on a platform that has caches that don't
allocate on write).


Right, so is there any need to flush_cache() at all?


Possibly not. I was taking my cue from lowlevel-blt-bench again, where it
flushes the cache between each type of test. And looking at the
implementation of flush_cache() again now, I note that due to the action
of allocate-on-write and physically mapped caches, it won't completely
flush most caches, only one page-size worth! Perhaps just delete it then.


 +bench (op, t, src_image, mask_image, dest_image, src_x, src_y, UINT32_MAX, 
500, n, t1, pixman_image_composite_wrapper);
 +bench (op, t, src_image, mask_image, dest_image, src_x, src_y, n, 
UINT32_MAX, NULL, t2, pixman_image_composite_empty);

[...]

I think that part I understood. My comment was about having a {} block
without any flow control statements for it. You are just using it to
scope some variables, which suggests that the block should actually be
a separate function.


Oh right. Well, I think 

Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-04-07 Thread Ben Avison

On Tue, 17 Mar 2015 11:12:53 -, Pekka Paalanen ppaala...@gmail.com wrote:

affine-bench should be added to .gitignore too.


OK.


+#define L2CACHE_SIZE (128 * 1024)


I see lowlevel-blt-bench.c also uses L2CACHE_SIZE, but where does it
come from? If it's really an arch/platform-independent constant, maybe
some test header could have it?


As you will no doubt have noticed, affine-bench is heavily inspired by
lowlevel-blt-bench. Admittedly it's a bit of a cheat to assume one L2
cache size, but in lowlevel-blt-bench, it's only used for sizing the
images used for the L2 test. For those purposes, it's only important that
it's a number big enough that the images are flushed out of the L1 cache
but small enough that they aren't flushed out to memory, so as long as
it's bigger than the largest L1 cache you're likely to encounter and no
smaller than the smallest L2 cache, then it'll do, and saves lots of
messy OS-specific code to determine the cache sizes.

Admittedly, now I look back on the use of the same constant in
affine-bench, I've taken the point of view that it's too difficult in the
presence of significant transformation coefficients to calculate image
sizes that sit within a particular cache, so I'm only bothering to test
the memory-constrained case. To do that, I'm using L2CACHE_SIZE to size
an array used to flush the L2 cache - but that needs the *largest*
typical L2 cache size, not the smallest, so 128K won't cut it here.

While I have easy access to the cache sizes on CPUs that I'm actively
targeting, a general-purpose benchmarker should probably be suitable for
all architectures. Should we read it from the OS somehow? (Anyone know
how best to do that for, say, Linux, Windows, OS X and iOS, in that case?)


Looks like Pixman style is using separate statements for struct
definition and typedefs.


Not exclusively - I think the bits I've looked at usually do, but now
that I count them, I grant you the most common form is separate typedefs.
This particular example was cut-and-pasted from pixman.c but can easily
be changed.


+static pixman_bool_t
+compute_transformed_extents (pixman_transform_t *transform,
+ const pixman_box32_t *extents,
+ box_48_16_t *transformed)


CODING_STYLE is asking for alignment here for the arg names.


This was also a cut-and-paste, but good point.


If there is no transform, why not return the original extents?
These have been reduced by a half unit in all four directions.


It's more obviously relevant in the bilinear scaling case, but a pixel's
colour is considered to apply to the midpoint of each pixel. Thus an
image of width N is specifying colours at ordinates 0.5, 1.5, 2.5 ...
(N-1.5), (N-0.5). Therefore, when you're working out which source pixels
are required to generate a range of output pixels, it's 0.5 and (N-0.5)
that you need to feed through any transformation matrix. If the no
transform case were special-cased, you'd then need to special-case it
again in the calling function because the 0.5 offset applies (albeit at
different scaling factors) to both source and destination images, so the
caller will be adding the 0.5 pixel border back on again.

In affine-bench, I calculate image sizes based on bilinear scaling for
simplicity, since a source or mask image big enough to satisfy COVER_CLIP
for bilinear scaling will also do so for nearest scaling.

[compute_transformed_extents]

+tx1 = ty1 = INT64_MAX;
+tx2 = ty2 = INT64_MIN;
+
+for (i = 0; i  4; ++i)
+{
+pixman_fixed_48_16_t tx, ty;
+pixman_vector_t v;
+
+v.vector[0] = (i  0x01)? x1 : x2;
+v.vector[1] = (i  0x02)? y1 : y2;
+v.vector[2] = pixman_fixed_1;
+
+if (!pixman_transform_point (transform, v))
+return FALSE;
+
+tx = (pixman_fixed_48_16_t)v.vector[0];
+ty = (pixman_fixed_48_16_t)v.vector[1];


This works only for affine. This is called affine-bench, so that is
natural, yet might be nice to add an assert or something to guarantee
no-one will copy this code to something else that uses projective
transforms.


I might be missing something, but it looks to me like
pixman_transform_point (through its subroutine
pixman_transform_point_31_16) does handle projective transforms, and what
this code segment is doing is iterating over the four corners of the
destination rectangle and finding the maximum and minimum source x/y by
considering the transformed position of each corner independently. Surely
the whole set of destination pixels, once passed through even a
projective matrix, must still lie within this bounding box?

[create_image]

Too long line, needs breaking up.


OK.

[various places]

Add empty line.


OK.


+*bits = aligned_malloc (4096, stride * height);


Is there a #define to get the 4096 from? I assume it's page size? What
if the platform has big pages?


I assume the same thing. It's quite a common page size as I understand
it. I know it's not 

Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

2015-03-17 Thread Pekka Paalanen
On Tue,  3 Mar 2015 15:24:20 +
Ben Avison bavi...@riscosopen.org wrote:

 ---
  test/Makefile.sources |1 +
  test/affine-bench.c   |  330 
 +
  2 files changed, 331 insertions(+), 0 deletions(-)
  create mode 100644 test/affine-bench.c
 
 diff --git a/test/Makefile.sources b/test/Makefile.sources
 index c20c34b..8b0e855 100644
 --- a/test/Makefile.sources
 +++ b/test/Makefile.sources
 @@ -37,6 +37,7 @@ OTHERPROGRAMS = \
   radial-perf-test\
  check-formats   \
   scaling-bench   \
 + affine-bench\
   $(NULL)

affine-bench should be added to .gitignore too.

  
  # Utility functions
 diff --git a/test/affine-bench.c b/test/affine-bench.c
 new file mode 100644
 index 000..55478d6
 --- /dev/null
 +++ b/test/affine-bench.c
 @@ -0,0 +1,330 @@
 +/*
 + * Copyright © 2014 RISC OS Open Ltd
 + *
 + * Permission to use, copy, modify, distribute, and sell this software and 
 its
 + * documentation for any purpose is hereby granted without fee, provided that
 + * the above copyright notice appear in all copies and that both that
 + * copyright notice and this permission notice appear in supporting
 + * documentation, and that the name of the copyright holders not be used in
 + * advertising or publicity pertaining to distribution of the software 
 without
 + * specific, written prior permission.  The copyright holders make no
 + * representations about the suitability of this software for any purpose.  
 It
 + * is provided as is without express or implied warranty.
 + *
 + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
 + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
 + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
 + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
 + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
 + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
 + * SOFTWARE.
 + *
 + * Author:  Ben Avison (bavi...@riscosopen.org)
 + */
 +
 +#include stdio.h
 +#include stdlib.h
 +#include string.h
 +#include ctype.h
 +#include stdint.h
 +#include utils.h
 +
 +#ifdef HAVE_GETTIMEOFDAY
 +#include sys/time.h
 +#else
 +#include time.h
 +#endif
 +
 +#define WIDTH  1920
 +#define HEIGHT 1080
 +#define L2CACHE_SIZE (128 * 1024)

Hi,

I see lowlevel-blt-bench.c also uses L2CACHE_SIZE, but where does it
come from? If it's really an arch/platform-independent constant, maybe
some test header could have it?

 +
 +typedef struct
 +{
 +pixman_fixed_48_16_tx1;
 +pixman_fixed_48_16_ty1;
 +pixman_fixed_48_16_tx2;
 +pixman_fixed_48_16_ty2;
 +} box_48_16_t;

Looks like Pixman style is using separate statements for struct
definition and typedefs.

 +
 +static pixman_bool_t
 +compute_transformed_extents (pixman_transform_t *transform,
 + const pixman_box32_t *extents,
 + box_48_16_t *transformed)

CODING_STYLE is asking for alignment here for the arg names.

 +{
 +pixman_fixed_48_16_t tx1, ty1, tx2, ty2;
 +pixman_fixed_t x1, y1, x2, y2;
 +int i;
 +
 +x1 = pixman_int_to_fixed (extents-x1) + pixman_fixed_1 / 2;
 +y1 = pixman_int_to_fixed (extents-y1) + pixman_fixed_1 / 2;
 +x2 = pixman_int_to_fixed (extents-x2) - pixman_fixed_1 / 2;
 +y2 = pixman_int_to_fixed (extents-y2) - pixman_fixed_1 / 2;
 +
 +if (!transform)
 +{
 +transformed-x1 = x1;
 +transformed-y1 = y1;
 +transformed-x2 = x2;
 +transformed-y2 = y2;

If there is no transform, why not return the original extents?
These have been reduced by a half unit in all four directions.

 +
 +return TRUE;
 +}
 +
 +tx1 = ty1 = INT64_MAX;
 +tx2 = ty2 = INT64_MIN;
 +
 +for (i = 0; i  4; ++i)
 +{
 +pixman_fixed_48_16_t tx, ty;
 +pixman_vector_t v;
 +
 +v.vector[0] = (i  0x01)? x1 : x2;
 +v.vector[1] = (i  0x02)? y1 : y2;
 +v.vector[2] = pixman_fixed_1;
 +
 +if (!pixman_transform_point (transform, v))
 +return FALSE;
 +
 +tx = (pixman_fixed_48_16_t)v.vector[0];
 +ty = (pixman_fixed_48_16_t)v.vector[1];

This works only for affine. This is called affine-bench, so that is
natural, yet might be nice to add an assert or something to guarantee
no-one will copy this code to something else that uses projective
transforms.

Or, maybe having affine as part of the function name?

 +
 +if (tx  tx1)
 +tx1 = tx;
 +if (ty  ty1)
 +ty1 = ty;
 +if (tx  tx2)
 +tx2 = tx;
 +if (ty  ty2)
 +ty2 = ty;
 +}
 +
 +transformed-x1 = tx1;
 +transformed-y1 = ty1;
 +transformed-x2 = tx2;
 +transformed-y2 = ty2;
 +
 +return TRUE;
 +}
 +