I think this is yet another tile-cache problem.
Georg Acher wrote:
> On Thu, Apr 05, 2001 at 12:45:52PM -0500, Kelly Martin wrote:
> > Hm, it does not. The issue with whirlpinch is that there's only a
> > weak locality relationship between destionation pixels (which are
> > iterated across the image) and source pixels (which are fetched with
> > the pixel fetcher). I haven't looked too closely at your blocking
> That is right, but destination and source for themselves have good locality
> (ie. the next pixel isn't 500 pixels away from the last).
This is an important observation.
> > patch, but I suspect that much the same improvement would be had by
> > using a pixel region (which respects tiles) to iterate across the
> > destination region.
I agree, but the current algorithm writes two rows (top and bottom) at
the same time, so it is not immediately possible to use the standard
pixel region iterator.
> That is possible... Is there a filter that definitely uses the pixel region
> stuff? Most filters I have seen only use one row, which may not be enough
> "locally", since it uses only one pixel but has to fill a whole cacheline
> (4/8 pixel). I will try whether I can speed up bumpmap also, since this
> takes also the magical 30s and is much more often used in scripts than the
> whirl&pinch module.
> I don't know how large a tile is, but since IMHO the major impact of
> blocking seems to come from the CPU cache, I suspect that is too big for
> older CPUs. I have done the whirl&pinch blocking thing about three years ago
> (and forgot to send the patch), and tried it on an Alpha21164 and a P5.
I think you're looking in the wrong direction here. Similar to the
bumpmap (see my other message) I strongly suspect that the tile-cache is
You can check this for yourself by profiling your code. I expect that
there is hardly any difference in processor time between your version
and the original but that there is an important difference in the number
of calls to gimp_tile_get and gimp_tile_put. (Sorry, did not try that
Requesting tiles from the main Gimp process is pretty expensive, it
involves several context switches and a lot of copying. This is orders
of magnitude slower than copying data within the plugin process from the
tile cache to some other buffer. The whirlpinch plugin uses a cache size
that is only big enough to handle the output tiles, there is no room for
input tiles. This leads to a series of cache misses and horrible
performance. Your modifications improve the cache-hit ratio because it
processes larger chunks of data.
However, it is not a fundamental solution. The easiest short time
solution is to increase the cache size with the number of input tiles
that are needed to compute one row of the output (I don't know if this
can be easily determined).
The more fundamental solution is to rewrite the algorithm so that it
works on a tile by tile basis. In that case the minimum cache size is
only 2 (for the output tiles) plus the number of input tiles that are
needed to compute one output tile. Of course it is better to use a
somewhat larger cache size because adjacent output tiles can use the
same input tiles.
Gimp-developer mailing list