Thank you for taking a serious look at GEGL, I've trimmed away the
bits relating to the VIPS backend and rather focus on the performance
numbers you get out and will try to explain them.
On Sun, Apr 17, 2011 at 10:22 AM, <jcup...@gmail.com> wrote:
> Linked against gegl-vips with the operations set to exactly match
> gegl's processing, the same thing runs in 27s real, 38s user. So it
> looks like some tuning of the disc cache, or maybe even turning it off
> for batch processing, where you seldom need pixels more than once,
> could give gegl a very useful speedup here. libvips has a threading
> system which is on by default and does double-buffered write-behind,
> which also help.
On my c2d 1.86ghz laptop I get 105s real 41s user with default settings.
Setting GEGL_SWAP=RAM in the environment to turn off the disk swapping
of tiles makes it run in 43s real 41s user. With the default settings
GEGL will start swapping when using more than 128mb of memory for
buffers, this limit can be increased by setting for instance
GEGL_CACHE_SIZE=1024 to not start swapping until 1gb of memory is in
use. This leads to similar behavior, the tile backend of GEGL is using
reads and writes on the tiles, using mmaping instead could increase
> If you use uncompressed tiff, you can save a further 15s off the
> runtime. libpng compression is slow, and even with compression off,
> file write is sluggish.
Loading a png into a tiled buffer as used by GeglBuffer is kind of
bound to be slow, at the moment GEGL doesnt have a native TIFF loader,
if the resources were spent on writing a proper TIFF backend to
GeglBuffer GEGL would be able to lazily swap in the image data from
TIFF files as needed.
> babl converts to linear float and back with exp() and log(). Using
> lookup tables instead saves 12s.
If the original PNG was 8bit, babl should have a valid fast path for
using lookup tables converting it to 32bit linear. For most other
conversions involved in this process babl would likely fall back to
reference conversions that go via 64bit floating point; and processes
each pixel with lots of logic perutating components etc. By
adding/fixing the fast paths in babl to match the reference conversion
a lot of the time spent converting pixels in this test should vanish.
> The gegl unsharp operator is implemented as gblur/sub/mul/add. These
> are all linear operations, so you can fold the maths into a single
> convolution. Redoing unsharp as a separable convolution saves 1s.
For smaller radiuses this is fine, for larger ones it is not, ideally
GEGL would be doing what is optimal behind the users back.
> Finally, we don't really need 16-bit output here, 8 is fine. This
> saves only 0.5s for tiff, but 8s for PNG.
Making the test case you used save to 8bit PNG instead gives me 34s
real and 33s user. I am not entirely sure if babl has a 32bit float ->
8bit nonlinear RGBA conversion, it might just be libpngs data
throughput that makes this difference.
save = gegl_node_new_child (gegl,
> Putting all these together, you get the same program running in 2.3s
> real, 4s user. This is still using linear float light internally. If
> you switch to a full 8-bit path you get 1s real, 1.5s user. I realise
> gegl is committed to float, but it's interesting to put a number on
> the cost.
This type of benchmark really stress tests the file loading/saving
parts of code where I am fully aware that GEGL is far from optimal,
but it is also something that doesn't in any way reflect GIMPs
_current_ use of GEGL which involves converting 8bit data to and from
float with some very specific formats and then only doing raw
processing. This will of course change in the future.
> Does this sound useful? I think it's maybe a way to weight the
> benefits of the various possible optimisations. I might try running
> the tests on a machine with a faster hard disk.
It is useful, but it would perhaps be even more useful to see similar
results for a test where the loading/saving is taken out of the
and measure raw image data crunching.
Setting GEGL_SWAP=RAM, BABL_TOLERANCE=0.02 in the environment to make
babl be lenient with the error introduced by its fast paths I run the
test in, it should be possible to fix the fast paths in babl to be
correct enough to pass the current stricter criteria for use; and thus
get these results without lowering standards. Even adding slightly
faster but guaranteed to be correct 8bit/16bit <-> float conversions
would likely improve this type of benchmarking.
16bit output: real: 28.3s user: 26.9s
8bit output: real: 25.1s user: 23.6s
Thank you for looking at this - and I do hope my comments above help
explain some of the reasons for the slower processing.
«The future is already here. It's just not very evenly distributed»
-- William Gibson
Gimp-developer mailing list