Re: [Mypaint-discuss] Surface optimizations proposed for merging

2012-11-21 Thread Jon Nordby
If there are no objections I will merge the surface-optizations branch to
master tomorrow or Friday. The performance improvements are pretty well
documented by now, and it has at least got some testing from other people
than me.
Till found a minor issue in the brushlib tests, and Martin one when using
it a bit differently from how we do in MyPaint. I will fix those.
Any other issues will be best tested and ironed out in master I think.

-- 
Jon Nordby - www.jonnor.com
___
Mypaint-discuss mailing list
Mypaint-discuss@gna.org
https://mail.gna.org/listinfo/mypaint-discuss


Re: [Mypaint-discuss] Surface optimizations proposed for merging

2012-11-21 Thread Martin Renold
No objections from me. Better merge it soon so it gets more testing. There
is quite some new C code.  If there is a bug hiding we may see a crash while
painting, not just an exception dialog.

Regards
Martin

On Wed, Nov 21, 2012 at 04:58:37PM +0100, Jon Nordby wrote:
 If there are no objections I will merge the surface-optizations branch to
 master tomorrow or Friday. The performance improvements are pretty well
 documented by now, and it has at least got some testing from other people
 than me.
 Till found a minor issue in the brushlib tests, and Martin one when using
 it a bit differently from how we do in MyPaint. I will fix those.
 Any other issues will be best tested and ironed out in master I think.
 
 -- 
 Jon Nordby - www.jonnor.com

 ___
 Mypaint-discuss mailing list
 Mypaint-discuss@gna.org
 https://mail.gna.org/listinfo/mypaint-discuss


-- 
Martin Renold

___
Mypaint-discuss mailing list
Mypaint-discuss@gna.org
https://mail.gna.org/listinfo/mypaint-discuss


Re: [Mypaint-discuss] Surface optimizations proposed for merging

2012-11-21 Thread Jon Nordby
On 21 November 2012 20:25, Martin Renold martin...@gmx.ch wrote:

 No objections from me. Better merge it soon so it gets more testing. There
 is quite some new C code.  If there is a bug hiding we may see a crash
 while
 painting, not just an exception dialog.

Merged. People: test the hell out of it!

-- 
Jon Nordby - www.jonnor.com
___
Mypaint-discuss mailing list
Mypaint-discuss@gna.org
https://mail.gna.org/listinfo/mypaint-discuss


Re: [Mypaint-discuss] Surface optimizations proposed for merging

2012-11-19 Thread Jon Nordby
On 18 November 2012 03:12, Jon Nordby jono...@gmail.com wrote:

 * After the changes, GEGL-based backend is circa 30% faster than the
 Python-based backend with 1 thread, and twice as fast with 2 threads.
 A quad-core CPU with 4 threads will have an even higher speedup.


Till tested the code on a 6 core AMD Phenom II. It shows that in these raw
performance tests, the GEGL based backend is up to 3-4 times as fast as the
Python one with 6 threads.

http://www.jonnor.com/files/temp/mypaint-brushengine-opt-gegl-vs-py.png
http://www.jonnor.com/files/temp/mypaint-brushengine-opt-gegl-vs-py.txt

-- 
Jon Nordby - www.jonnor.com
___
Mypaint-discuss mailing list
Mypaint-discuss@gna.org
https://mail.gna.org/listinfo/mypaint-discuss


Re: [Mypaint-discuss] Surface optimizations proposed for merging

2012-11-19 Thread Jon Nordby
On 18 November 2012 03:12, Jon Nordby jono...@gmail.com wrote:

 I have some more ideas for further improve performance, and am working
 to document these now.

Now documented in the surface-optimizations branch, file
brushlib/PERFORMANCE:
http://gitorious.org/mypaint/mypaint/blobs/surface-optimizations/brushlib/PERFORMANCE

Here are the main points:
=== TODO: Improve vectorization ===
=== TODO: More efficient serial code ===
=== TODO: Try different tile sizes ===
=== IDEA: Dab masks cache ===
=== IDEA: Make use of GPU processing: OpenCL and OpenGL ===

-- 
Jon Nordby - www.jonnor.com
___
Mypaint-discuss mailing list
Mypaint-discuss@gna.org
https://mail.gna.org/listinfo/mypaint-discuss


[Mypaint-discuss] Surface optimizations proposed for merging

2012-11-17 Thread Jon Nordby
Hi all,
I finished the last missing pieces of the surface optimization I
started a while back. The changes are not that invasive but it could
use some real-life testing before going into master. If no issues are
found I'd like for it to be a part of MyPaint 1.1 release.

The code is found in the surface-optimizations on mainline repository:
http://gitorious.org/mypaint/mypaint/commits/surface-optimizations
Please test! (checkout branch, build and run mypaint as normal)

== Changes ==
The optimizations follow a three-pronged strategy:

1. Reordering of data access to minimize fetching and updating of tiles.
2. Coarse grained parallelism using multithreading via OpenMP directives.
3. Fine grained parallelism using SSE via GCC auto-vectorization.

The MyPaint surface API has a concept of an atomic transaction:
surface.begin_atomic() and surface.end_atomic(). Inside such a
transaction, we call brush.stroke_to(surface, ...) each time there is
a motion event on the canvas. Depending on the brush configuration and
current state this may result in 0 to N surface.draw_dab() calls. N
can be in the order of 10-100.
Previously each draw_dab() call would fetch the affected tiles,
process the draw_dab operation and update the tiles with the results.
When subsequent draw_dab() calls affect the same tiles, fetching and
updating of tiles would happen up to N-1 times as often as is needed.

Now, each time draw_dab() is called, an operation struct is added to a
queue for each of the affected tiles before returning. No processing
is done at this point. When end_atomic() is called to complete the
transaction, the tiles that have pending operations are distributed
evenly among the processing threads. The processing of a tile is
completely independent of other tiles, allowing it to be done in a
lock-free manner.

When a get_color() request is made by the brush engine during a
surface transaction, the pending draw_dab operations on the affected
tiles must be flushed to return the correct value. Both the flushing
and calculation of the color is done multi-threaded in the same way as
above.

Within each thread, SSE based vectorization is used to process a tile.
Currently this is limited to part of the brush mask calculation, as
the run-length encoding of the masks makes it difficult to
auto-vectorize all of the mask calculation and the
blending/compositing.

== Results ==
These results are on from my laptop, running Arch Linux current. CPU:
Dual-core Intel i5 M520@2.4 GHz, 6GB RAM

Note: this benchmarks the *raw* surface rendering performance. The
user *may* experience speed-ups similar to what is shown here, but
this is is only if layer compositing and rendering to screen is not a
bottleneck.

http://jonnor.com/files/temp/mypaint-brushengine-opt.png
http://jonnor.com/files/temp/mypaint-brushengine-opt.txt

Take-aways:
* 20% to 50% performance improvements for larger brushes (16 px+) on
the currently used Python-based backend.
* Performance does not regress significantly for small brushes, max
-4% degradation found.
* After the changes, GEGL-based backend is circa 30% faster than the
Python-based backend with 1 thread, and twice as fast with 2 threads.
A quad-core CPU with 4 threads will have an even higher speedup.

To reproduce:
  scons enable_gegl=true enable_openmp=true # to enable GEGL backend,
requires babl+gegl git
  cd brushlib/tests
  export PYTHONPATH=../../lib:../..
  export LD_LIBRARY_PATH=../..
  export GEGL_SWAP=RAM
  export OMP_NUM_THREADS=2
  ../../lib/test-python-surface # current python-based backend
  ./test-gegl-surface # GEGL backend

Look inside mypaint-test-surface.c to see/change the different test cases.

== Future ==
Given that the GEGL backend has a significantly higher raw
performance, I hope that after we release MyPaint 1.1 we can start the
transition to use it instead of our current backend.

I have some more ideas for further improve performance, and am working
to document these now.

-- 
Jon Nordby - www.jonnor.com

___
Mypaint-discuss mailing list
Mypaint-discuss@gna.org
https://mail.gna.org/listinfo/mypaint-discuss