[Gegl-developer] new GEGL samplers: what they are and where they are going (REVISED)

Nicolas Robidoux Tue, 01 Sep 2009 07:23:02 -0700

========
Abstract
========

Eric Daoust and Adam Turcotte are about to wrap up a pair of Google
Summer of Code 2009 projects having to do with new samplers.  In this
document, I explain what the samplers do, how they (may) fit within
the GEGL library, and where they are going (most likely: feedback
would be nice).


I also briefly indicate what will need to be done when non-affine
transformations (perspective, warp) are implemented in GEGL so that
they use the samplers tuned for downsampling (for example, thumbnail
production) with maximal benefit.

=============
QUICK SUMMARY
=============

In total, twelve new samplers were programmed:

six samplers tuned for transformations in which good downsampling is
more important than good upsampling (for example, thumbnail
production):

  downsize, downsharp, downsmooth and their simplified/faster versions
  downsizefast, downsharpfast and downsmoothfast

and

six samplers tuned for transformations in which good upsampling is
more important than good downsampling (for example, image enlargement
and rotation):

  upsize, upsharp, upsmooth and their simplified/faster versions
  upsizefast, upsharpfast and upsmoothfast

(Note: Perspective transformations often are both upsampling and
downsampling, depending on the location and even the direction.)

Every sampler can be used for any resampling task (enlarging,
shrinking, rotating, perspective, warping...). "Sampler tuned for X"
means that the sampler is built to excel at task X. It does not mean
that it does poorly at other tasks.

The naming convention which Adam, Eric and I used (which could be
changed) is as follows:

  "up*" samplers are tuned for upsampling, "down*" samplers are tuned
  for downsampling,

  "*sharp*" samplers maximise sharpness at the expense of increased
  aliasing ("jaggies"); *sharp* samplers behave more like nearest
  neighbour than the other methods.  "*smooth*" samplers minimise
  jaggies at the expense of additional blur. "*size*" samplers are a
  compromise between the two.
  
  "*fast are simplified/faster versions. If "fast" is omitted, the
  corresponding method is the higher quality/slower version.

As implemented for GEGL, none of the above methods are parameterised.

The distinguishing feature of the methods (except upsharpfast) is that
they do not introduce any haloing artifacts, and yet do well at
antialiasing (esp. upsize, upsmooth and the future version of
downsmooth).

================
Relevant patches
================

http://bugzilla.gnome.org/show_bug.cgi?id=588336
(remove the yafr sampler)

http://bugzilla.gnome.org/show_bug.cgi?id=588180
(remove the sharp sampler, and add upsharp)

http://bugzilla.gnome.org/show_bug.cgi?id=588193
(add upsharpfast, upsizefast and upsmoothfast)

http://bugzilla.gnome.org/show_bug.cgi?id=592498
(add upsize)

http://bugzilla.gnome.org/show_bug.cgi?id=592515
(add upsmooth)

http://bugzilla.gnome.org/show_bug.cgi?id=588016
(add downsharp, downsize, downsmooth)

http://bugzilla.gnome.org/show_bug.cgi?id=592349
(add downsharpfast, downsizefast and downsmoothfast)

(Non-sampler patches which Eric and Adam produced are not included here.)

====================
Where this is going?
====================

If it is felt that GEGL does not need "pretty good fast samplers," the
"*fast" methods could simply not be integrated into GEGL (or made
"public"). If and when GEGL has "quality levels," one could write a
driver which selects the fast method when the chosen quality level is
low. In any case, programming the "*fast" methods was not a waste: The
s/nohalo family of methods is multi-stage, hence the "cheaper"
versions are used as stages for the "non-fast" ones; in addition, the
improved fast versions programmed for GSoC will be ported to
VIPS. This being said, it would be nice if they could be merged into
GEGL trunk and then removed, so that the git repository have a record
of them.

Keeping the "*fast" methods private (or not merging them at all) would
reduce the total number of "public" GEGL methods from twelve to
six. The following would further reduce the total number of methods to
four (or eight if "*fast" samplers are kept).

  Downsampling schemes: The downsize and downsharp samplers are very
  similar. For this reason, it would make sense to only keep the
  downsize and downsmooth samplers (same with downsizefast and
  downsmoothfast). Note that I am not 100% sure that users will prefer
  the downsize method to the downsharp method: it depends on whether
  they dislike aliasing (jaggies) more than than blur. (I dislike
  aliasing, hence my preference for downsize.)

  The upsize and upsharp samplers are very different from each other,
  but research performed by Chantal Racette as part of her Laurentian
  University Honours Thesis during the Summer suggests that the method
  underlying upsharp should be used as a final stage for the
  multi-stage methods upsize and upsmooth, yielding superior methods
  which would replace the current upsize and upsmooth and make the
  current upsharp obsolete.

So, in the end, there could only be four new additional GEGL samplers:

  Two samplers tuned for downsampling: downsize and downsmooth (with
  fast versions downsizefast and downsmoothfast if desired)

  Two samplers tuned for upsampling: upsize and upsmooth (with fast
  versions upsizefast and upsmoothfast)

Every single sampler is a good general purpose method. The downsize
method, in particular, will be found to give acceptable results in
"all" situations (no worse, and generally better, than bilinear when
upsampling, and much better than bilinear when downsampling). For this
reason, I think that downsize may be a good default GEGL sampler.

===================
DETAILED DISCUSSION
===================

===========
Terminology
===========

  "Samplers" are methods like nearest neighbour, bilinear, bicubic,
  lanczos, Catmull-Rom etc. This is what they do: Given a bitmap image
  (which, at least in principle, specifies pixel values at pixel
  locations only), samplers are used to obtain pixel values at
  arbitrary locations, which is required when resizing, rotating,
  applying perspective transformations etc.

  Many key sampler properties are easier to explain in terms of the
  "reconstructed surface," which is the surface obtained by "plotting"
  the pixel values at all possible locations (not only the original
  pixel locations) as if they were heights (the discussion is simpler
  if one thinks about greyscale images).

  A sampler is differentiable if the reconstructed surface has a well
  defined tangent at every point (there is no "ridge," "sharp peak" or
  other types of "corners"). Such ridges are visible, for example,
  when using bilinear to perform a high ratio enlargement.

  An "interpolatory sampler" is one such that if a value is requested
  at an original pixel location, the pixel value returned is the
  original pixel value, unchanged. In other words, the reconstructed
  surface goes through the original image data points. Nearest
  neighbour, bilinear, bicubic, lanczos and Catmull-Rom are
  interpolatory; smoothing bicubic spline (the current GEGL bicubic
  default) is not. Many non-interpolatory samplers are smoothing. In
  addition, many exact-area samplers are non-interpolatory.

  A co-monotone sampler has the following property: If the data (pixel
  values) is non-increasing or non-decreasing (in some direction,
  generally parallel to the axes), then the reconstructed surface has
  the same property between the corresponding locations. A co-monotone
  sampler does not introduce "haloing."  Nearest neighbour, bilinear
  and smoothing bicubic splines are co-monotone; Lanczos and
  Catmull-Rom are not co-monotone (and they often introduce
  significant haloing).

  A strongly bounded sampler has the following property: The resampled
  value is always between between the values at the four nearest input
  pixel locations. If a sampler is monotone, then it is strongly
  bounded; the converse is not true. A strongly bounded sampler can
  only add haloing at sub-pixel scale, and the amount of haloing is
  strongly limited. Nearest neighbour, bilinear and smoothing bicubic
  splines are strongly bounded; Lanczos and Catmull-Rom are
  not. Bounded samplers do not require clamping.

  A sampler is "exact on linears" if the reconstructed surface is a
  plane (far enough from the boundary) whenever the input image's
  pixel values all lie on a plane. Bilinear, smoothing bicubic splines
  and Catmull-Rom are exact on linears; nearest neighbour and Lanczos
  are not (I need to double check for Lanczos).

==================================
Description of the twelve samplers
==================================

See the source code for details and additional comments.

===============
Upsharp sampler
===============

"Upsharp" is an implementation of the symmetrized MP-Quadratic method
(Monotonicity Preserving Hermite interpolation with derivative
estimated by fitting a parabola (Quadratic polynomial)). Basically, it
is Catmull-Rom with derivatives clamped so as to ensure monotonicity.

1D MP-quadratic (for curve, not surface, interpolation) is
described in

  Accurate Monotone Cubic Interpolation, by Hung T. Huynh, published
  in the SIAM Journal on Numerical Analysis, Volume 30, Issue 1
  (February 1993), pages 57-100, 1993. ISSN:0036-1429.

and in

  NASA technical memorandum 103789, which can be downloaded from
  ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19910011517_1991011517.pdf

In order to ensure reflexion symmetry about diagonal lines, 1D
MP-quadratic is performed two different ways---horizontally then
vertically, and vertically then horizontally---and
averaged. (Symmetry about 45 degree lines is not automatically
respected because MP-quadratic is a nonlinear method: interpolating
horizontally then vertically does not necessarily give the same as
interpolating vertically then horizontally.)

Upsharp is interpolatory, co-monotone, exact on linears and
differentiable.

Future improvements: This method is likely to replace the final stage
of the upsize and upsmooth methods, which will make them smoother and,
at least in the case of upsize, less aliased. In addition, Chantal
Racette has recently figured out how to implement this method using
one third less minmod function evaluations, which should make it run
considerably faster: specifically, a much simplified version of

  http://en.wikipedia.org/wiki/Bicubic_interpolation

can be used instead of the current approach. Finally, current research
suggests that the co-monotonicity enforcing derivative clamping is
overkill and that the result would be less aliased without a
noticeable increase in haloing if it only enforced strong boundedness.

===================
Upsharpfast sampler
===================

Upsharpfast is an efficient (noticeably faster than the current GEGL
bicubic) implementation of Catmull-Rom:

http://en.wikipedia.org/wiki/Cubic_Hermite_spline

Upsharpfast is interpolatory, exact on linears and differentiable.

==============
Upsize sampler
==============

Upsize is an implementation of Nohalo level 2 ("Nohalo" = "no halo"),
which is obtained by performing two diagonal straightening
subdivisions of the input image, finishing with bilinear
interpolation.

Nohalo level 2 is an unpublished, state of the art method.

Upsize is interpolatory, co-monotone and exact on linears.

It is especially suited for enlarging colour photographs (which
typically are sub-critical, that is, have smallest relevant
details/transitions which are several pixel wide).

Future improvements: Research by Chantal Racette suggests that one
level of subdivision (that is, Nohalo level 1) followed by a variant
of symmetrized MP-quadratic instead of bilinear may yield a better
method than Nohalo level 2.

==================
Upsizefast sampler
==================

Upsizefast is an implementation of Nohalo level 1, which is described
in

  CPU, SMP and GPU implementations of Nohalo level 1, a fast co-convex
  antialiasing image resampler, by Nicolas Robidoux, Minglun Gong,
  John Cupitt, Adam Turcotte and Kirk Martinez, published in the
  Proceedings of the Canadian Conference on Computer Science &
  Software Engineering, C3S2E 2009, Montreal, Quebec, Canada, May
  19-21, 2009, ACM International Conference Proceedings Series, pages
  185-195.  http://doi.acm.org/10.1145/1557626.1557657

Nohalo level 1 consists of one diagonal straightening subdivision
followed by bilinear interpolation.

Upsizefast is interpolatory, co-monotone and exact on linears.

================
Upsmooth sampler
================

Upsmooth is an implementation of (the simplest version of) Snohalo
level 2 ("Snohalo" = "smooth no halo"), which consists of Nohalo level
2 preceded by a de-noising and antialiasing smoothing convolution
tuned so that a sharp black on white diagonal line or interface is
straight following the first subdivision.

Snohalo level 2 is an unpublished, state of the art method.

Upsmooth is co-monotone and exact on linears. Although upsmooth is not
differentiable (it will be if and when MP-quadratic is used as final
stage) this is only apparent for large upsampling ratios.

Upsmooth is strongly anti-aliasing. It is especially suited for
enlarging text, "old school" game characters, and heavily
(destructively) compressed images and photographs (jpg with heavy
compression, for example).

Future improvements: It is possible that applying smoothing between
every level instead of only initially (as was done for the mock ups in
Adam Turcotte's GSoC web site), or doing one level of subdivision and
then using a relative of symmetrized MP-quadratic as a finishing
scheme, yield better results.

====================
Upsmoothfast sampler
====================

Upsmoothfast is an implementation of Snohalo level 1, which consists
of Nohalo level 1 preceded by the antialiasing blur.

Snohalo level 1 is another unpublished, state-of-the-art method.

Upsmoothfast is co-monotone and exact on linears.

=================
Downsharp sampler
=================

Downsharp is an exact area method which is similar to exact area box
filtering with an averaging rectangle that has sides 1/sqrt(2) of
those of the smallest rectangle (with sides aligned with untransformed
coordinate axes) which contains the approximate inverse image of an
"output" pixel area (a square centered at the sampling location in
transformed space, with sides, of length one, aligned with the
axes). The inverse image is approximated using the inverse Jacobian
matrix of the transformation. The averaging is done over the "minmod
surface" which is implicit to the S/Nohalo methods.

The 1/sqrt(2) factor ensures that when the transformation does not
downsample in any direction, the sampler is interpolatory. It also
implies that, of the six downsamplers, it is the one which behaves
most like nearest neighbour (although it is very different from it).

Downsharp is yet another unpublished method.

Downsharp is co-monotone and exact on linears. When upsampling, it is
interpolatory.

Downsharp is well-suited for producing thumbnails and other
transformations (perspective?) in which large areas become a single
pixel.

Future improvements: Fix things so that downsampling ratios larger
than about 100 are better handled. The simplest way would be to have
the downsamplers use a larger buffer than the current 64x64 (128x128,
for example); otherwise, using branching and multiple "get pointer to
input buffer" calls if necessary would do the job.

================
Downsize sampler
================

Downsize is like downsharp except that the rectangle always fully
contains the approximate inverse image of a pixel area.

Downsize is yet another unpublished method. It was inspired
somewhat by the PDL::Transform method of Craig DeForest.

Downsize is co-monotone. When upsizing (no rotation or warp), it is
interpolatory and exact on linears.

Downsize is well-suited for producing thumbnails etc.

Future improvements: Like for downsharp, better handling of very large
downsampling ratios. It is possible that averaging over the actual
approximate inverse image, or over a related disk or ellipse, give
better results than over the smallest containing rectangle.

==================
Downsmooth sampler
==================

Downsmooth is like downsize except that the rectangle is (at least)
twice the size required to contain the approximate inverse image of a
pixel area.

Downsmooth is yet another unpublished method.

Downsmooth is co-monotone and exact on linears.

Downsmooth is well-suited for producing thumbnails etc.

Future improvements: Like for downsize, better large downsampling
ratio handling. Also: The rectangular averaging box is not optimal:
Current research suggests that a "cross" would produce thumbnails with
both less blur and less aliasing.

==============================================
Downsharpfast, Downsizefast and Downsmoothfast
==============================================

They are like their "slow" counterparts except that the reconstructed
surface which is averaged is the nearest neighbour one (the piecewise
constant reconstruction) instead of the minmod surface.  

They are co-monotone. Downsharpfast is exact on linears when
upsampling and downsizefast is exact on linears when enlarging.

=================================================================
When perspective and more general warping are implemented in GEGL
=================================================================

In order to adjust the size of the averaging rectangle, the
downsamplers need information about the transformation which uses
them. We have implemented the samplers so that they use inverse
Jacobian matrix information. This is trivial to obtain in GEGL because
affine transformations already compute the inverse of their linear
part. 

When non-affine transformations get implemented, similar information
will have to be passed on to the down methods. The inverse Jacobian
matrix does not need to be exact: Any halfway reasonable approximation
will do (of course, for perspective transformations, the exact inverse
Jacobian matrix is easy to compute). Note that when the transformation
upsamples (in one or both directions), the corresponding side lengths
(which are, in principle, small, since a pixel area in the transformed
image "comes" from a very small area of the input image) are clamped
up to 1; for this reason, it is not very important to program an
inverse Jacobian matrix which is accurate where the transformation
"blows up."

Adam Turcotte can provide additional details regarding
implementation. Also see the source code of his patch

http://bugzilla.gnome.org/show_bug.cgi?id=588016
_______________________________________________
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer

[Gegl-developer] new GEGL samplers: what they are and where they are going (REVISED)

Reply via email to