I've had a stab at a quick hack of gegl-0.1.6 to use libvips (another
demand-driven image processing library) as the backend for batch
processing. I think it is maybe an interesting way to look at gegl
performance, for this use case at least.
This has some very strong limitations. First, it will not work
efficiently with interactive destructive operations, like "paint a
line". This would need area cache invalidation in vips, which is a way
off. Secondly, I've only implemented a few operations (load / crop /
affine / unsharp / save / process), so all you can do is some very
basic batch processing. It should work for dynamic graphs (change the
parameters on a node and just downstream nodes will recalculate) but
it'd need a "display" node to be able to test that.
There's a README with some more detail on how it works, a test program
and some timings.
If I run the test program linked against gegl-0.1.6 on a 5,000 x 5,000
pixel RGB PNG image on my laptop (a c2d at 2.4GHz), I get 96s real,
44s user. I tried experimenting with various settings for GEGL_SWAP
and friends, but I couldn't get it to go faster than that, I probably
missed something. Perhaps gegl's disk cache plus my slow laptop
harddrive are slowing it down.
Linked against gegl-vips with the operations set to exactly match
gegl's processing, the same thing runs in 27s real, 38s user. So it
looks like some tuning of the disc cache, or maybe even turning it off
for batch processing, where you seldom need pixels more than once,
could give gegl a very useful speedup here. libvips has a threading
system which is on by default and does double-buffered write-behind,
which also help.
If you use uncompressed tiff, you can save a further 15s off the
runtime. libpng compression is slow, and even with compression off,
file write is sluggish.
The alpha channel is not needed in this case, dropping it saves about
5s real time.
babl converts to linear float and back with exp() and log(). Using
lookup tables instead saves 12s.
The gegl unsharp operator is implemented as gblur/sub/mul/add. These
are all linear operations, so you can fold the maths into a single
convolution. Redoing unsharp as a separable convolution saves 1s.
Finally, we don't really need 16-bit output here, 8 is fine. This
saves only 0.5s for tiff, but 8s for PNG.
Putting all these together, you get the same program running in 2.3s
real, 4s user. This is still using linear float light internally. If
you switch to a full 8-bit path you get 1s real, 1.5s user. I realise
gegl is committed to float, but it's interesting to put a number on
Does this sound useful? I think it's maybe a way to weight the
benefits of the various possible optimisations. I might try running
the tests on a machine with a faster hard disk.
Gimp-developer mailing list