Hi Ian,
Neat! Thanks!
I want to do two things: first, I want to share the list of
FeatureSource modifications that I've encountered so far, and second to
draw an analogy to other query engines.
Since I pretty much never work with SQL databases, I have encountered a
number of problems would SQL Views (or Database table views) probably
would have solved. As those were unavailable, the options have been
do-without/work-around or implement a GeoTools DataStore (or use a
GS-based solution). Here's my quick list of things I've seen:
1. Desire to enrich data from a GeoTools DataStore with some external
datasource. The first example I encountered with this involved getting
data from a triple store (so there was no sensible GT DataStore for
it). I believe this case can be thought of as a left outer join (where
the GT DS is on the left). Your WPS seems to work in the same vein.
One handy thing that this DS did was to split up a CQL filter into the
part which could passed to the GT DS.
2. Desire to glue together one or more DataStores. GeoMesa has a
Merged View DS to do this. The assumption is that each of the DS has a
FS with the same SFT. This is a kin to a table union.
3. Desire to query across layers. The GeoServer Query Layer extension
does this. One co-worker suggested that this could be thought of as a
nested inner join.
4. Desire to modify or add columns based on functions. (That's what
this PR is aiming at.)
All of this makes me think of the Catalyst query engine in Spark.
Apache Spark offers an SQL interface. The SQL query is broken down into
logical plan, which is optimized, and then converted into a physical
plan. There is a DataSource interface which is similar to the GT
DataSource API. Taken together, I'm starting to think of GeoServer in
that same role as a query engine. The various places that a
FeatureCollection is wrapped effectively line up with parts of a
physical plan. The challenges I see are 1) that's there is presently
little in terms of "explanability" (e.g. one cannot inspect a query's
execution plan) and 2) there's no obvious place to plug any sort of
optimization rule (instead any wrapping point needs to provide a
FeatureCollection which will delegate visitors appropriately, etc.)
Anyhow, hopefully the concrete list helps us see places where we could
focus on supporting individual query patterns / use cases. For my second
point, I don't have a concrete goal; I suppose I'm sharing this thought
/ analogy to see if others have had similar thoughts (or have an obvious
plan around it).
Cheers,
Jim
On 11/3/2020 3:33 PM, Ian Turton wrote:
Currently, it's a WPS process - the code is at
https://github.com/ianturton/tablejoin
<https://github.com/ianturton/tablejoin>
It's built around a FilterVisitor that replaces join values with
literals I think.
Ian
On Tue, 3 Nov 2020 at 19:25, Jim Hughes <jhug...@ccri.com
<mailto:jhug...@ccri.com>> wrote:
Hi Ian,
Interesting. Is the Joining datastore somewhere online already?
I'll admit to having written similar datastores previously. One
of my ideas for using this kind of DataStore would be to leverage
functions like those in the GS Query Layer extension to pull
attributes from other layers. Similarly, it ought to be trivial
to pull a column from a CSV on disk.
As one starts down that path, there are syntactical annoyances (I
cannot see an easy way to say "get a bunch of columns from other
there" without having a function call per column), and performance
considerations (reading from a file/database repeated for the same
information among separate queries is less than ideal, so caching
would be nice). That said, in a pinch, this approach may provide
a really, really quick solution.
Cheers,
Jim
On 11/3/20 1:53 PM, Ian Turton wrote:
This looks interesting, I wonder how hard it would be to merge in
the work I did on Joining datastores that could create "views"
across non-jdbc datastores so you could add a CSV to a geometry
in another store, I should dig that code out and look at it
again. Maybe a christmas lockdown project?
Ian
On Tue, 3 Nov 2020 at 18:19, Jim Hughes <jhug...@ccri.com
<mailto:jhug...@ccri.com>> wrote:
Hi all,
At various times, Jody and I have chatted about having a "CQL
View" in
GeoServer (or something similar akin to the SQL Views) that'd
leverage
the Transform module and allow one to add columns to a
FeatureSource
based on expressions. Such expressions could use existing or
custom CQL
functions and that'd open a world of possibilities.
Last week, I asked Jody about this idea again and he
indicated that one
would really just need to write a DataStoreFactory. I banged
out an
initial implementation as a starting place for the
conversation. This
implementation provides a way to project down to a subset of
attributes
as well as add new columns/attributes to a FeatureStore.
If folks like it as-is, I'd be happy to add unit tests, and
documentation wherever we see fit. Thanks in advance for
some feedback
on this idea!
https://github.com/geotools/geotools/pull/3196/files
<https://github.com/geotools/geotools/pull/3196/files>
Cheers,
Jim
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
<mailto:GeoTools-Devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/geotools-devel
<https://lists.sourceforge.net/lists/listinfo/geotools-devel>
--
Ian Turton
--
Ian Turton
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel