Re: [Geotools-devel] Extra IRC 2, in which we revisit the axis order and come to a concensus

Bryce L Nordgren Wed, 02 Aug 2006 11:21:57 -0700


Andrea Aime <[EMAIL PROTECTED]> wrote on 08/02/2006 10:50:04 AM:


>
> Ah, on the same note, there is also the consistency problem:
> with grid coverages the sole idea of having grids in long/lat even if
their
> native format is lat/long makes my head hurt. Here again we do the
eventual
> reprojection only as the last stage after cutting/zooming whatever to
> avoid scary performance problems (Simone correct me if I'm wrong).

Ahhh.
Grid axes are fake.  The only axis which matters is {address}.
Multidimensional constructs are mathematical fictions which exist for the
sole purpose of calculating an {address}.  The only difference between
long/lat and lat/long is the formula used to compute address into the
buffer.  Ergo, swapping axes is trivial because you can use either formula
without copying the actual buffer.

Actual storage of data in a buffer will mirror the source, whatever it is.
(Note, the 1D buffer may be ordered by a space filling curve and not by a
simple 2D row-major or column-major ordering.)

Data access to a Coverage is provided by geospatial location, which is used
as another fiction to calculate an index into the grid, which is in turn
used to calculate an address into the buffer.

Geospatial Location == fake
Grid                == fake.
Buffer              == real.

In between is an index object of some kind.

Performance relates to the "real" and not the "fake".

What really will hit you performancewise is if you are accessing data
differently than it is stored.  (e.g. using a row-major iterator on a
column-major order grid; grabbing a single-band image out of a
band-interleaved-by-pixel dataset; etc.)  These are the things that can't
be fixed with an index object.  Conversely, even if your "index object"
sports a data order which you would think would give you efficient data
access, it is the actual order in the buffer which matters in terms of
cache misses (L1/2/N Cache for memory; RAM cache for disks.)

Perhaps the best thing you can do, if you *know* how data are to be
accessed, is to preprocess the data such that it is stored in the same way
you want to access it.  For instance: preprocess satellite data into
band-sequential order; read large 2D images in and save with the "tiling"
option; etc.  Of course, if you do this and you guess wrong (e.g. some
schmuck requests the spectral signature of a pixel after you save as band
sequential), then you're back to the bad performance.  One needs to
identify usage patterns, then optimize for the most common usage, making
performance worse for non-conforming users. :) Can't have everything.

Alternatively, if the only problem is that the renderers produce an image
by using row-major iteration (note this is different than row-major
storage), just swap the indices in the iteration loop in the
rendering/resampling code.  Data access patterns must match data storage to
be efficient.  It's the golden rule.  If you can't control how data are
stored, you need to be flexible in how you access. :)

(BTW-this email uses the proverbial "you", not you = andrea)

Bryce


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Re: [Geotools-devel] Extra IRC 2, in which we revisit the axis order and come to a concensus

Reply via email to