All,

I'd like to propose a change to the Query api that will enable SQL-style
projections (as opposed to geospatial projections).

The Query class currently has a variant that includes an array of String
properties that specifies a sub-type of the feature schema.  This is
especially useful in e.g. Geoserver because rendering does not necessarily
require all attributes.  Each data store can optimize these types of
queries to only transfer the minimal set of data required to satisfy the
query.

Geotools also has a transformation API (in gt-transform) that allows an
application to transform (rename, compute transformations on multiple
properties, etc) in a very flexible way.  However, the transform API does
not delegate to the underlying data store.  Rather, it builds a
TransformFeature* that dynamically maps the data to the new schema on the
fly.

I propose that we unify these two concepts and allow data stores to
optimize the transformation.  For instance, PostGIS could optimize a query
such as:

select id as identifier, strConcat(id, name) as uniqname, buffer(geom,
0.01) as bufferedgeom from geom table where ....

which combines renaming of attributes, mapping of multiple attributes to a
single attribute, and geometric computations in a single query.  In the
case of the GeoMesa data store, these transformations can be executed in
parallel across a distributed set of compute and storage resources.

To support this capability, we would need to modify the Query object to
take a data structure very similar to the transform api.  This consists of
Definitions and contained Expressions.  The FeatureSource will then compute
a new schema based on the transform and return a collection representing
the mapped result.  The data store could then optimize the population of
the collection using whatever capabilities it has available to it.

The first two DataStores targeted for this functionality could be GeoMesa
and ContentDataStore.  ContentDataStore can construct a TransformFeature*
to do the transformation on the fly while GeoMesa could interpret the
transformation and parallelize the computation.

Thoughts?

Also, question on procedure: should this email thread turn into a proposal
on confluence or jira (if it gets that far)?  How do these things normally
work?



P.s. I've hacked a version of this into the GeoMesa code base as I had a
need for it in an application.  The hacked version intercepts the Query in
FeatureSource.getFeatures(), treats the String array as a
Definition/Expression string, and builds the new schema.  It then rewrites
the Query against the original schema and sets up the parallelized
transformation.  Works quite well and provides *very* efficient map-reduce
style computations using (E)CQL.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
GeoTools-Devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to