On Mon, Jun 13, 2011 at 9:16 PM, Justin Deoliveira <[email protected]> wrote:
> * app-schema vs simple features
>
> With knowing zero about app-schema currently I believe there is the ability
> to do joins via feature chaining. However my impression is that these
> relationships are configured before hand and not really created on the fly?
> Correct me if I am wrong.
>
> So perhaps we could just say that we support joins with app-schema and call
> it a day. However that said I do think there is a case for supporting joins
> with simple features as well. And to be honest working with app-schema,
> because of the learning curve, would be out of scope for this project.

Agreed, I would stick with simple features until complex ones get to a
ease of use and performance level that makes them usable by average
Joe.

> * cross datastore joins
>
> When talking about doing joins there are varying levels of complexity. For
> instance talking about supporting joins of feature types within a jdbc
> datastore is one thing. Supporting joining say a shapefile feature type to a
> jdbc feature type is a total different ball of wax. Doing cross datastore
> joins is something i think would be neat... but far from trivial to do it in
> a way that scales. A much simpler problem would be joining two feature types
> within the same datastore. However still unless the datastore is one that
> can do joins natively (jdbc is really the only one here) it is still a hard
> problem. For instance consider attempting to join two Shapefile feature
> types from the same datastore... doable but again difficult to do in a non
> naive way.

My take:
- if it's in-datastore join and datastore join can be optimized go for it
- if two different data stores a 1-1 join can be still optimized by sorting
  and first and second collection by the join attribute. Quite few assumptions,
  like no other ordering required and sorting supported, but still, an option
- the universal fallback is the inner loop kind, one query on the second store
  for each layer in the first.
  Supsceptible of optimizations this one too, for example, one could first
  get all the join keys, then get all the features from the second store that
  satifsy the join keys, store everything in a local database, and then perform
  the join using sorting or at least use the attribute indexes available in the
  local db. Or some other scheme using secondary file system (e.g., one
  file containing features per joining key, or bucket of joining keys).

> * query interface
>
> Given that only some datastores can do joins efficiently makes it a
> good candidate for QueryCapabilities with the addition of a method
> "isJoiningSupported". That interface change is relatively straight forward.
> However one that is not is how to modify Query (if that is the way to go) to
> support joins. I can think of a few different strategies:
>
> 1. Not modify it at all and come up with a new interface called
> "JoinSupportingDataStore" or something that adds some new methods for joins.
>
> 2. Subclass Query and add some new join methods. Looking around
> I actually notice that there is some code in app-schema that does just this
> called JoiningQuery
>
> 3. Modify Query directly to add support for joins
>
> Thoughts? When I thought about the alternatives I thought (3) made the most
> sense. Especially given how we support other concepts that are not supported
> in all datastores like sorting.
>
> So I decided to go further with (3), and added a class called "Join", that
> looks something like the following:
>
> class Join {
>
>   /** the feature type being joined to */
>
>   String getTypeName();
>
>   /** the attributes from the joined feature type to select */
>
>   List<PropertyName> getProperties()
>
>   /** the join filter */
>
>   Filter getJoinFilter();
>
>   /** additional filter to apply to the feature type being joined to */
>
>   Filter getFilter();
>
> }
>
> And then it was a matter of modifying Query adding a new property.
>
> class Query {
>
>   List<Join> getJoins();
>
> }
>
> So with this api the above query would look something like this:
> Query q = new Query("Persons");
> q.setFilter(PropertyIsEqualTo(PropertyName("Identifer"), Literal(12345)));
> Join j = new Join("Persons");
> j.setJoinFilter(PropertyIsEqualTo(PropertyName("spouse"),
> PropertyName("Identifer")));
> q.getJoins().add(j);
> That is obviously simplified quite a bit... there still a few things to iron
> out like handling name clashes, etc... but that would be the general idea.
> Thoughts?

Makes sense to me.

> * joined features
> Another major question is what should the result of a join look like? Given
> that the current return from a query is features I thought it best to stick
> with that not come up with some new class or something to represent a tuple
> (although maybe that is something worth considering). I thought of a few
> different alternatives. To illustrate consider two feature types:
> f1 (name, geometry)
> f2 (name, foo, geom)
> 1. Return a single feature with attributes from joined feature types "rolled
> into it". So the resulting joined feature would look like:
>   f'(name, geometry, name, foo, geom)
> 2. Return a single feature that contains attributes for joined features:
>   f'(name, geometry, f2)
> 3. Return a single feature that contains attributes for all features in the
> join
>   f'(f1,f2)
> All methods have their various issues. (1) for instance requires that we
> break simple feature rules since we have two attributes with the same local
> name.
> (2) requires us to have attribute types that are SimpleFeatureType. Which I
> don't think technically violates simple feature rules although admittedly
> not something that happens often.
> (3) Same more or less as (2) but more represents the notion of the "tuple".
> Question is what id to give to the feature? If any?
> Pretty open to suggestions on this one... i imagine there is probably a
> better solution than any of those three. In the end with the prototype i
> decided to go with (2). Seemed the least invasive.

All can work imho. 2 is nice in that it's still a feature and has a evident
identifier, 3 works too, the id could be the concatenation of the respective
ones (though it would scale really poorly for large number of joins).
1 can be an option if we add attribute aliasing, so that we can have
f1_name, f1_geometry, f2_name, f2_foo, f2_geom

> * join types
> Joins come in many flavors... inner vs outer, etc... The wfs spec specifies
> that the semantics are that of an inner join. But I guess we could add some
> notion of join type to the join class so that a user could specify which
> type of join they want? Or maybe just stick with inner join since that is
> the requirement and the most common case?

Outer joins really come in handy in some occasions, should not be
too hard to add support for them in the in memory joins, and a keyword
away for database joins... not sure, but if it's not too hard having
them would be nice.

> That is about it for now... sorry it's a lot random thoughts i know. I
> currently have a basic implementation working in the jdbc module. It needs
> testing and to handle some more special cases but with it I have been able
> to do a variety of joins, both "standard" and spatial.

Cool!

Cheers
Andrea


-- 
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy

phone: +39 0584 962313
fax:      +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to