From: Søren Sandmann Pedersen s...@redhat.com
When calling a fast path, we need to pass the corresponding
implementation since it might contain information necessary to run the
fast path.
---
pixman/pixman.c | 26 --
1 files changed, 16 insertions(+), 10 deletions
From: Søren Sandmann Pedersen s...@redhat.com
The do_composite() function is a lot more readable this way.
---
pixman/pixman.c | 200 +-
1 files changed, 107 insertions(+), 93 deletions(-)
diff --git a/pixman/pixman.c b/pixman/pixman.c
index
Siarhei Siamashka siarhei.siamas...@gmail.com writes:
Overall looks like a good fix, a few comments below.
Thanks for the comments. I'll send a new patch with a long commit log
as a follow-up to this message (provided I can make it work with
git-send-email), but I'll reply to some specifics
From: Søren Sandmann Pedersen s...@redhat.com
This extends scaling-crash-test to test some more things:
- All combinations of NEAREST/BILINEAR/CONVOLUTION filters and
NORMAL/PAD/REFLECT repeat modes.
- Tests various scale factors very close to 1/7th such that the source
area is very close
Would it be possible instead to add a new flag OPAQUE_SAMPLES
that would be set whenever the image format is opaque, and then
use it along with SAMPLES_COVER_CLIP to add the OPAQUE flag before
strength reducing the operator?
That would help all the backends, including the general one,
From: Søren Sandmann Pedersen s...@redhat.com
This flag is set whenever the pixels of a bits image don't have an
alpha channel. Together with FAST_PATH_SAMPLES_COVER_CLIP it implies
that the image effectively is opaque, so we can do operator reductions
such as OVER-SRC.
---
pixman/pixman-image.c
From: Søren Sandmann Pedersen s...@redhat.com
It doesn't make sense in other cases, and the computation would make
use of image-bits.{width,height} which lead to uninitialized memory
accesses when the image wasn't of type BITS.
---
pixman/pixman.c | 17 ++---
1 files changed, 10
This function is an implementation of the X server request
Trapezoids. That request is what the X backend of cairo is using all
the time; by moving it into pixman we can hopefully make it faster.
---
pixman/pixman-trap.c | 87 ++
pixman/pixman.h
From: Søren Sandmann Pedersen s...@redhat.com
A CRC32 based test program to check that pixman_composite_trapezoids()
actually works.
---
test/Makefile.am|5 +
test/composite-traps-test.c | 253 +++
2 files changed, 258 insertions(+), 0
From: Søren Sandmann Pedersen s...@redhat.com
The Render X extension can draw triangles as well as trapezoids, but
the implementation has always converted them to trapezoids. This patch
moves the X server's triangle conversion code into pixman, where we
can reuse the pixman_composite_trapezoid
From: Søren Sandmann Pedersen s...@redhat.com
When the source is opaque and the destination is alpha only, we can
avoid the temporary mask and just add the trapezoids directly.
---
pixman/pixman-trap.c | 133 -
1 files changed, 76 insertions
From: Søren Sandmann Pedersen s...@redhat.com
This allows some more code to be deleted from the X server. The
implementation consists of converting to trapezoids, and is shared
with pixman_composite_triangles().
---
pixman/pixman-trap.c | 61
From: Søren Sandmann Pedersen s...@redhat.com
The fb version simply calls the new pixman_composite_triangles(). This
allows us to get rid of miCreateAlphaPicture().
Signed-off-by: Søren Sandmann s...@redhat.com
---
fb/fbpict.c |1 +
fb/fbpict.h | 10 +
fb/fbtrap.c | 109
From: Søren Sandmann Pedersen s...@redhat.com
This allows the remaining triangle-to-trap conversion code to be
deleted.
Signed-off-by: Søren Sandmann s...@redhat.com
---
fb/fbtrap.c | 91 ++-
1 files changed, 9 insertions(+), 82
Here is a patch series that removes all use of MMX from
pixman-sse2.c. This avoids all the emms issues and is likely also a
speedup on Windows x64, where MMX intrinsics are not supported and
therefore had to be emulated.
b/configure.ac|2
b/pixman/pixman-sse2.c
From: Søren Sandmann Pedersen s...@redhat.com
It's not necessary now that the file doesn't use MMX instructions.
---
configure.ac |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/configure.ac b/configure.ac
index 5242799..8d96647 100644
--- a/configure.ac
+++ b
From: Søren Sandmann Pedersen s...@redhat.com
---
pixman/pixman-sse2.c | 137 --
1 files changed, 0 insertions(+), 137 deletions(-)
diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
index 0753b6d..286dea8 100644
--- a/pixman/pixman-sse2.c
From: Søren Sandmann Pedersen s...@redhat.com
Also make pixman_fill_sse2() static.
---
pixman/pixman-sse2.c | 18 --
1 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
index 0509613..88287b4 100644
--- a/pixman/pixman
I forgot to CC pixman@lists.freedesktop.org on the following
patch. The patch is necessary to make trapezoid rendering directly to
X windows work and also makes the pixman_composite_trapezoids() API
more similar to pixman_image_composite(). See this thread:
The following patches add a noop implementation, which is used as
topmost in the implementation hierarchy. It is supposed to contain
iterators and compositing routines that don't do anything. For
example, there is a compositing fast path for the DST operator.
This is useful because it allows more
From: Søren Sandmann Pedersen s...@redhat.com
This new implementation is ahead of all other implementations in the
fallback chain and is supposed to contain operations that don't
require any work. For examples, it might contain a fast path for the
DST operator that doesn't actually do anything
From: Søren Sandmann Pedersen s...@redhat.com
It will at some point become useful to have CPU specific destination
iterators. However, a problem with that is that such iterators should
not be used if we can composite directly in the destination image.
By moving the noop destination iterator
From: Søren Sandmann Pedersen s...@redhat.com
Iterating a NULL image returns NULL for all scanlines. This may as
well be done in the noop iterator.
---
pixman/pixman-implementation.c | 12 +---
pixman/pixman-noop.c | 24
2 files changed, 17
From: Søren Sandmann Pedersen s...@redhat.com
When the image is a8r8g8b8 and not transformed, and the fetched
rectangle is within the image bounds, scanlines can be fetched by
simply returning a pointer instead of copying the bits.
---
pixman/pixman-noop.c | 39
From: Søren Sandmann Pedersen s...@redhat.com
---
pixman/pixman-cpu.c | 22 +++---
pixman/pixman-private.h |2 ++
2 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/pixman/pixman-cpu.c b/pixman/pixman-cpu.c
index aa9036f..a0d2f8c 100644
--- a/pixman/pixman
git://people.freedesktop.org/~sandmann/pixman
in the branch cpudetectfiles.
Hi,
The following patches contains some cleanups to the CPU detection in
general, and some improvements to the x86 specific parts in particular.
I was looking at making use of some of the newer x86 SIMD
From: Søren Sandmann Pedersen s...@redhat.com
There is no reason to have pixman_have_feature functions when all
they do is call pixman_have_mips_feature().
Instead rename pixman_have_mips_feature() to have_feature() and call
it directly from _pixman_mips_get_implementations(). Also on
non-Linux
From: Søren Sandmann Pedersen s...@redhat.com
Similar to the x86 commit, this moves the ARM specific CPU detection
to its own file which exports a pixman_arm_get_implementations()
function that is supposed to be a noop on non-ARM.
---
pixman/Makefile.sources |1 +
pixman/pixman-arm.c
From: Søren Sandmann Pedersen s...@redhat.com
Extract the x86 specific parts of pixman-cpu.c and put them in their
own file called pixman-x86.c which exports one function
pixman_x86_get_implementations() that creates the MMX and SSE2
implementations. This file is supposed to be compiled on all
From: Søren Sandmann Pedersen s...@redhat.com
Organize pixman-arm.c such that each operating system/compiler exports
a detect_cpu_features() function that returns a bitmask with the
various features that we are interested in. A new function
have_feature() then calls this function, caches
From: Søren Sandmann Pedersen s...@redhat.com
---
pixman/Makefile.sources |1 +
pixman/pixman-cpu.c | 77 +
pixman/pixman-mips.c| 110 +++
pixman/pixman-private.h |3 ++
4 files changed, 115 insertions
From: Søren Sandmann Pedersen s...@redhat.com
---
pixman/Makefile.sources|1 -
pixman/pixman-cpu.c| 79
pixman/pixman-implementation.c | 51 ++
3 files changed, 51 insertions(+), 80 deletions(-)
delete
From: Søren Sandmann Pedersen s...@redhat.com
---
pixman/Makefile.sources |1 +
pixman/pixman-cpu.c | 165 +---
pixman/pixman-ppc.c | 192 +++
pixman/pixman-private.h |3 +
4 files changed, 197
From: Søren Sandmann Pedersen s...@redhat.com
Get rid of the initialized and have_vmx static variables in
pixman-ppc.c There is no point to them since CPU detection only
happens once per process.
On Linux, just read /proc/self/auxv instead of generating the filename
with getpid() and don't
From: Søren Sandmann Pedersen s...@redhat.com
A new function pixman_cpuid() is added that runs the cpuid instruction
and returns the results.
On GCC this function uses inline assembly that is written such that it
will work on both 32 and 64 bit. Compared to the old code, the only
difference
Hi,
The following patches change the 64 pipeline to use single precision
floating point channels instead.
The main benefit of this is that we get more range and precision so
that we can support HDR image formats such as half precision floating
point argb. Unlike 16 bpc, single precision floating
From: Søren Sandmann Pedersen s...@redhat.com
Comment out some formats in blitters-test that are going to rely on
floating point in some upcoming patches.
---
test/blitters-test.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/test/blitters-test.c b/test/blitters-test.c
From: Søren Sandmann Pedersen s...@redhat.com
In preparation for an upcoming change of the wide pipe to use floating
point, comment out some formats in glyph-test that are going to be
using floating point and update the CRC32 value to match.
---
test/glyph-test.c | 7 +--
1 file changed, 5
From: Søren Sandmann Pedersen s...@redhat.com
This file contains floating point implementations of combiners for all
pixman operators. These combiners operate on buffers containing single
precision floating point pixels stored in (a, r, g, b) order.
The combiners are added
From: Søren Sandmann Pedersen s...@redhat.com
Three new function pointer fields are added to bits_image_t:
fetch_scanline_float
fetch_pixel_float
store_scanline_float
similar to the existing 32 and 64 bit accessors. The fetcher_info_t
struct in pixman_access similarly gets
From: Søren Sandmann Pedersen s...@redhat.com
GdkPixbufs are not premultiplied, so when using them to display pixman
images, there is some unecessary conversions going on: First the image
is converted to non-premultiplied, and then GdkPixbuf premultiplies
before sending the result to the X server
This will be useful for putting iterators into tables where they can
be looked up by iterator flags. Without this flag, wide iterators can
only be recognized by the absence of ITER_NARROW, which makes testing
for a match difficult.
---
pixman/pixman-general.c | 20 +---
Similar to the changes to noop, put all the iterators into a table of
pixman_iter_info_t and then do a generic search of that table during
iterator initialization.
---
pixman/pixman-sse2.c | 64
1 file changed, 35 insertions(+), 29 deletions(-)
Similar to the SSE2 commit, information about the iterators is stored
in a table of pixman_iter_info_t.
---
pixman/pixman-mmx.c | 64 +
1 file changed, 35 insertions(+), 29 deletions(-)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
Similar to the SSE2 and MMX patches, this commit replaces a table of
fetcher_info_t with a table of pixman_iter_info_t, and similar to the
noop patch, both fast_src_iter_init() and fast_dest_iter_init() are
now doing exactly the same thing, so their code can be shared in a new
function called
A new field, 'iter_info', is added to the implementation struct, and
all the implementations store a pointer to their iterator tables in
it. A new function, _pixman_implementation_iter_init(), is then added
that searches those tables, and the new function is called in
pixman-general.c and
The SSE2, MMX, and fast implementations all have a copy of the
function iter_init_bits_stride that computes an image buffer and
stride.
Move that function to pixman-utils.c and share it among all the
implementations.
---
pixman/pixman-fast-path.c | 19 +--
pixman/pixman-mmx.c
This new iterator uses the SSSE3 instructions pmaddubsw and pabsw to
implement a fast iterator for bilinear scaling.
There is a graph here recording the per-pixel time for various
bilinear scaling algorithms as reported by scaling-bench:
Here is a new version of the bilinear scaler that fixes Matt's and
Siarhei's comments and also uses movdqu instead of movdqa for the
writes to iter-buffer. This ensures that the iterator doesn't impose
new alignment restrictions that could interfere with the
direct-to-destination optimizations.
This new iterator uses the SSSE3 instructions pmaddubsw and pabsw to
implement a fast iterator for bilinear scaling.
There is a graph here recording the per-pixel time for various
bilinear scaling algorithms as reported by scaling-bench:
At the moment iter buffers are only guaranteed to be aligned to a 4
byte boundary. SIMD implementations benefit from the buffers being
aligned to 16 bytes, so ensure this is the case.
V2:
- Use uintptr_t instead of unsigned long
- allocate 3 * SCANLINE_BUFFER_LENGTH byte on stack rather than just
This commit adds a new, empty SSSE3 implementation and the associated
build system support.
configure.ac: detect whether the compiler understands SSSE3
intrinsics and set up the required CFLAGS
Makefile.am:Add libpixman-ssse3.la
pixman-x86.c: Add X86_SSSE3 feature flag
By using this function instead of compute_crc32() the alpha masking
code and the call to image_endian_swap() are not duplicated.
---
test/affine-test.c | 12 ++--
test/composite-traps-test.c | 11 +--
test/scaling-test.c | 12 ++--
3 files changed, 5
Pixman supports negative strides, but up until now they haven't been
tested outside of stress-test. This commit adds testing of negative
strides to blitters-test, scaling-test, affine-test, rotate-test, and
composite-traps-test.
---
test/affine-test.c | 22 --
The affine-test, blitters-test, and scaling-test all have the ability
to print out the bytes of the destination image. Share this code by
moving it to utils.c.
At the same time make the code work correctly with negative strides.
---
test/affine-test.c | 12 +---
From: Søren Sandmann Pedersen s...@redhat.com
The generated fetchers for NEAREST, BILINEAR, and
SEPARABLE_CONVOLUTION filters are fast paths and so they belong in
pixman-fast-path.c
---
pixman/pixman-bits-image.c | 530
pixman/pixman-fast-path.c
The overall goal of the following patches is to make it more obvious
how the blend mode code relates to the specifications. To that end,
the comment for each blend routine is updated with some math that
shows how we go from specification to a formula that can deal with
premultiplied alpha, and the
Fix a bunch of spacing issues.
---
pixman/pixman-combine32.c | 112 +++---
1 file changed, 56 insertions(+), 56 deletions(-)
diff --git a/pixman/pixman-combine32.c b/pixman/pixman-combine32.c
index 3ac7576..be3cfdf 100644
--- a/pixman/pixman-combine32.c
This commit overhauls the comments in pixman-comine32.c regarding
blend modes:
- Add a link to the PDF supplement that clarifies the specification of
ColorBurn and ColorDodge
- Clarify how the formulas for premultiplied colors are derived form
the ones in the PDF specifications
- Write out
Change blend_color_dodge() to follow the math in the comment more
closely.
Note, the new code here is in some sense worse than the old code
because it can now underflow the unsigned variables when the source is
superluminescent and (as - s) is therefore negative. The old code was
careful to clamp
There are no semantic changes, just variables renames. The motivation
for these renames is so that the names are shorter and better match
the one used in the comments.
---
pixman/pixman-combine32.c | 199 +++---
1 file changed, 99 insertions(+), 100
For superluminescent destinations, the old code could underflow in
uint32_t r = (ad - d) * as / s;
when (ad - d) was negative. The new code avoids this problem (and
therefore causes changes in the checksums of thread-test and
blitters-test), but it is likely still buggy due to the use of
Since a4c79d695d52c94647b1aff7 the constant
BILINEAR_INTERPOLATION_BITS must be strictly less than 8, so fix the
comment to say this, and also add a COMPILE_TIME_ASSERT in the
bilinear fetcher in pixman-fast-path.c
---
pixman/pixman-fast-path.c | 2 ++
pixman/pixman-private.h | 2 +-
2 files
This series addresses the comments by Bill and also changes
pixman-fast-path.c so that it picks NEAREST fast paths before
BILINEAR. (I noticed this because the new filter-reduction-test.c
failed to detect a bug that I deliberately introduced).
Søren
This new test tests a bunch of bilinear downscalings, where many have
a transformation such that the BILINEAR filter can be reduced to
NEAREST (and many don't).
A CRC32 is computed for all the resulting images and compared to a
known-good value for both 4-bit and 7-bit interpolation.
V2: Remove
This new test tests a bunch of bilinear downscalings, where many have
a transformation such that the BILINEAR filter can be reduced to
NEAREST (and many don't).
A CRC32 is computed for all the resulting images and compared to a
known-good value for both 4-bit and 7-bit interpolation.
The following two patches generalize the reduction of BILINEAR to
NEAREST based on the formula mentioned here:
https://lists.freedesktop.org/archives/pixman/2010-August/000321.html
Søren
___
Pixman mailing list
Pixman@lists.freedesktop.org
From: Bill Spitzak
Instead of using the boundary of xformed rectangle, use the boundary
of xformed ellipse. This is much more accurate and less blurry. In
particular the filtering does not change as the image is rotated.
Signed-off-by: Bill Spitzak
From: Bill Spitzak
Signed-off-by: Bill Spitzak
Reviewed-by: Søren Sandmann
---
demos/scale.ui | 1 +
1 file changed, 1 insertion(+)
diff --git a/demos/scale.ui b/demos/scale.ui
index f6f6e89..d498d26 100644
--- a/demos/scale.ui
From: Bill Spitzak
Simpsons uses cubic curve fitting, with 3 samples defining each
cubic. This makes the weights of the samples be in a pattern of
1,4,2,4,2...4,1, and then dividing the result by 3.
The previous code was using weights of 1,2,0,6,0,6...,2,1.
With this fix the
When a BILINEAR filter is reduced to NEAREST, it is possible for both
types of fast paths to run; in this case, the NEAREST ones should be
preferred as that is the simpler filter.
Signed-off-by: Soren Sandmann
---
pixman/pixman-fast-path.c | 4 ++--
1 file changed, 2
From: Bill Spitzak
Rearranged so that the entire block of memory for the filter pair
is allocated first, and then filled in. Previous version allocated
and freed two temporary buffers for each filter and did an extra
memcpy.
v8: small refactor to remove the filter_width
From: Bill Spitzak
Only the triangle is discontinuous at 0. The other filters resemble a
cubic closely enough that Simpsons integration works without
splitting.
Changes by Søren: Rebase without the changes to the integral function,
update comment to match the new code.
From: Bill Spitzak
v11: Restored range checks
Signed-off-by: Bill Spitzak
Reviewed-by: Oded Gabbay
---
pixman/pixman-filter.c | 14 --
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/pixman/pixman-filter.c
Hi,
The following patch series contains those of Bill's patches that I
think are ready to be pushed to master, plus some other related
changes that I also think are ready.
01-03: These are patches to do more BILINEAR->NEAREST filter
reductions. They were inspired by a similar patch in
The convolution of two BOX filters is simply the length of the
interval where both are non-zero, so we can simply return width from
the integral() function because the integration region has already
been restricted to be such that both functions are non-zero on it.
This is both faster and more
This new test tests a bunch of bilinear downscalings, where many have
a transformation such that the BILINEAR filter can be reduced to
NEAREST (and many don't).
A CRC32 is computed for all the resulting images and compared to a
known-good value for both 4-bit and 7-bit interpolation.
V2: Remove
Generalize and simplify the code that reduces BILINEAR to NEAREST so
that the reduction happens for all affine transformations where
t00...t12 are integers and (t00 + t01) and (t10 + t11) are both
odd. This is a sufficient condition for the resulting transformed
coordinates to be exactly at the
From: Bill Spitzak
If enable-gnuplot is configured, then you can pipe the output of a
pixman-using program to gnuplot and get a continuously-updated plot of
the horizontal filter. This works well with demos/scale to test the
filter generation.
The plot is all the different
When a BILINEAR filter is reduced to NEAREST, it is possible for both
types of fast paths to run; in this case, the NEAREST ones should be
preferred as that is the simpler filter.
Signed-off-by: Soren Sandmann
---
pixman/pixman-fast-path.c | 4 ++--
1 file changed, 2
Generalize and simplify the code that reduces BILINEAR to NEAREST so
that all the reduction happens for all affine transformations where
t00..t12 are integers and (t00 + t01) and (t10 + t11) are both
odd. This is a sufficient condition for the resulting transformed
coordinates to be exactly at the
81 matches
Mail list logo