Hi Ian,

Neat!  Thanks!

I want to do two things:  first, I want to share the list of FeatureSource modifications that I've encountered so far, and second to draw an analogy to other query engines.

Since I pretty much never work with SQL databases, I have encountered a number of problems would SQL Views (or Database table views) probably would have solved.  As those were unavailable, the options have been do-without/work-around or implement a GeoTools DataStore (or use a GS-based solution). Here's my quick list of things I've seen:

1.  Desire to enrich data from a GeoTools DataStore with some external datasource.  The first example I encountered with this involved getting data from a triple store (so there was no sensible GT DataStore for it).  I believe this case can be thought of as a left outer join (where the GT DS is on the left).  Your WPS seems to work in the same vein.  One handy thing that this DS did was to split up a CQL filter into the part which could passed to the GT DS.

2.  Desire to glue together one or more DataStores.  GeoMesa has a Merged View DS to do this.  The assumption is that each of the DS has a FS with the same SFT.  This is a kin to a table union.

3.  Desire to query across layers.  The GeoServer Query Layer extension does this.  One co-worker suggested that this could be thought of as a nested inner join.

4.  Desire to modify or add columns based on functions.  (That's what this PR is aiming at.)

All of this makes me think of the Catalyst query engine in Spark.  Apache Spark offers an SQL interface.  The SQL query is broken down into logical plan, which is optimized, and then converted into a physical plan.  There is a DataSource interface which is similar to the GT DataSource API.  Taken together, I'm starting to think of GeoServer in that same role as a query engine.  The various places that a FeatureCollection is wrapped effectively line up with parts of a physical plan.  The challenges I see are 1) that's there is presently little in terms of "explanability" (e.g. one cannot inspect a query's execution plan) and 2) there's no obvious place to plug any sort of optimization rule (instead any wrapping point needs to provide a FeatureCollection which will delegate visitors appropriately, etc.)

Anyhow, hopefully the concrete list helps us see places where we could focus on supporting individual query patterns / use cases. For my second point, I don't have a concrete goal; I suppose I'm sharing this thought / analogy to see if others have had similar thoughts (or have an obvious plan around it).

Cheers,

Jim

On 11/3/2020 3:33 PM, Ian Turton wrote:
Currently, it's a WPS process - the code is at https://github.com/ianturton/tablejoin <https://github.com/ianturton/tablejoin>

It's built around a FilterVisitor that replaces join values with literals I think.



Ian

On Tue, 3 Nov 2020 at 19:25, Jim Hughes <jhug...@ccri.com <mailto:jhug...@ccri.com>> wrote:

    Hi Ian,

    Interesting.  Is the Joining datastore somewhere online already?

    I'll admit to having written similar datastores previously.  One
    of my ideas for using this kind of DataStore would be to leverage
    functions like those in the GS Query Layer extension to pull
    attributes from other layers.  Similarly, it ought to be trivial
    to pull a column from a CSV on disk.

    As one starts down that path, there are syntactical annoyances  (I
    cannot see an easy way to say "get a bunch of columns from other
    there" without having a function call per column), and performance
    considerations (reading from a file/database repeated for the same
    information among separate queries is less than ideal, so caching
    would be nice).  That said, in a pinch, this approach may provide
    a really, really quick solution.

    Cheers,

    Jim

    On 11/3/20 1:53 PM, Ian Turton wrote:
    This looks interesting, I wonder how hard it would be to merge in
    the work I did on Joining datastores that could create "views"
    across non-jdbc datastores so you could add a CSV to a geometry
    in another store, I should dig that code out and look at it
    again. Maybe a christmas lockdown project?

    Ian

    On Tue, 3 Nov 2020 at 18:19, Jim Hughes <jhug...@ccri.com
    <mailto:jhug...@ccri.com>> wrote:

        Hi all,

        At various times, Jody and I have chatted about having a "CQL
        View" in
        GeoServer (or something similar akin to the SQL Views) that'd
        leverage
        the Transform module and allow one to add columns to a
        FeatureSource
        based on expressions.  Such expressions could use existing or
        custom CQL
        functions and that'd open a world of possibilities.

        Last week, I asked Jody about this idea again and he
        indicated that one
        would really just need to write a DataStoreFactory.  I banged
        out an
        initial implementation as a starting place for the
        conversation.  This
        implementation provides a way to project down to a subset of
        attributes
        as well as add new columns/attributes to a FeatureStore.

        If folks like it as-is, I'd be happy to add unit tests, and
        documentation wherever we see fit.  Thanks in advance for
        some feedback
        on this idea!

        https://github.com/geotools/geotools/pull/3196/files
        <https://github.com/geotools/geotools/pull/3196/files>

        Cheers,

        Jim



        _______________________________________________
        GeoTools-Devel mailing list
        GeoTools-Devel@lists.sourceforge.net
        <mailto:GeoTools-Devel@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/geotools-devel
        <https://lists.sourceforge.net/lists/listinfo/geotools-devel>



-- Ian Turton




--
Ian Turton
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to