Hi all, I'm writing to solicit feedback from the community about an improvement to App-Schema's joining implementation that we have been working on lately.
In short, we would like to introduce support for the SQL encoding of filters on nested attributes, which, in the current implementation, end up in the post-filter and are evaluated in memory, after all the features have been loaded. For those who might be unfamiliar with it, the inner workings of the current joining implementation are clearly illustrated by Niels Charlier in this presentation: https://www.seegrid.csiro.au/wiki/pub/Infosrvices/GeoserverAppSchemaJoining/joining_presentation.pdf The aforementioned presentation mentions the fact that filtering on nested attributes is "currently done with post-filtering", and suggests a possible strategy for translating them to SQL. What we have implemented so far is a slight variation of the approach suggested by Niels. Let's take GeoSciML's MappedFeature (container) and GeologicUnit (nested) features as example. The following filter on the gml:description attribute of the nested GeologicUnit type: <ogc:PropertyIsLike> <ogc:PropertyName>gsml:specification/gsml:GeologicUnit/gml:description</ogc:PropertyName> <ogc:Literal>*sedimentary*</ogc:Literal> </ogc:PropertyIsLike> would be ignored by the current joining implementation (i.e. it is regarded as a post-filter): SELECT ... FROM "appschematest"."MAPPEDFEATURE" ORDER BY "appschematest"."MAPPEDFEATURE"."ID" ASC, "appschematest"."MAPPEDFEATURE"."PKEY" SELECT ... FROM "appschematest"."GEOLOGICUNIT" INNER JOIN "appschematest"."MAPPEDFEATUREPROPERTYFILE" ON ( "MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" = "GEOLOGICUNIT"."GML_ID") ORDER BY "appschematest"."MAPPEDFEATUREPROPERTYFILE"."ID" ASC, "appschematest"."GEOLOGICUNIT"."GML_ID" ASC, "appschematest"."GEOLOGICUNIT"."PKEY" but would be translated to the following SQL queries in the new implementation: SELECT ... FROM "appschematest"."MAPPEDFEATURE" INNER JOIN ( SELECT DISTINCT "ID" FROM "appschematest"."MAPPEDFEATURE" *WHERE EXISTS (* * SELECT "chain_link_1"."PKEY" FROM "appschematest"."GEOLOGICUNIT" "chain_link_1" * * WHERE UPPER("chain_link_1"."TEXTDESCRIPTION") LIKE '%SEDIMENTARY%' AND "appschematest"."MAPPEDFEATURE"."GEOLOGIC_UNIT_ID" = "chain_link_1"."GML_ID")* ) "temp_alias_used_for_filter" ON ( "MAPPEDFEATURE"."ID" = "temp_alias_used_for_filter"."ID" ) ORDER BY "appschematest"."MAPPEDFEATURE"."ID" ASC, "appschematest"."MAPPEDFEATURE"."PKEY" SELECT ... FROM "appschematest"."GEOLOGICUNIT" INNER JOIN "appschematest"."MAPPEDFEATUREPROPERTYFILE" ON ( "MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" = "GEOLOGICUNIT"."GML_ID") INNER JOIN ( SELECT DISTINCT "ID" FROM "appschematest"."MAPPEDFEATUREPROPERTYFILE" *WHERE EXISTS (* * SELECT "chain_link_1"."PKEY" FROM "appschematest"."GEOLOGICUNIT" "chain_link_1" * * WHERE UPPER("chain_link_1"."TEXTDESCRIPTION") LIKE '%SEDIMENTARY%' AND "appschematest"."MAPPEDFEATUREPROPERTYFILE"."GEOLOGIC_UNIT_ID" = "chain_link_1"."GML_ID")* ) "temp_alias_used_for_filter" ON ( "MAPPEDFEATUREPROPERTYFILE"."ID" = "temp_alias_used_for_filter"."ID" ) ORDER BY "appschematest"."MAPPEDFEATUREPROPERTYFILE"."ID" ASC, "appschematest"."GEOLOGICUNIT"."GML_ID" ASC, "appschematest"."GEOLOGICUNIT"."PKEY" We have seen significant performance improvements, especially with the total number of features in the DB is high (several thousands), but only a few of them would satisfy the filter. At present, we don't aim for a comprehensive implementation of nested filters encoding; as a consequence, our implementation has the following limitations: 1. only binary comparison, PropertyIsLike and PropertyIsNull filters are translated to SQL 2. polymorphic mappings are not supported 3. filters involving multiple nested attributes are not supported In case anybody wants to take a look before a formal PR, the code can be found in this branch of my personal geotools fork: https://github.com/ridethepenguin/geotools/tree/my_nested_filters_join As this enhancement only applies when joining is enabled, unit tests have been added in the gs-app-schema-test module of geoserver: https://github.com/ridethepenguin/geoserver/tree/my_nested_filters_join_tests Last but not least, I would like to point out a couple of fixes (I hope) I've introduced as I went along. The first is in UnmappingFilterVisitor: https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-de21f72fdbda25bc69ea142fad909c8eR821 I've introduced it to make encoding of joining for simple content possible, but it seems to me to be a more general fix to avoid having nulls in the encoded filter. The second is in XPathUtil: https://github.com/geotools/geotools/compare/master...ridethepenguin:my_nested_filters_join#diff-11339740d8ff9ccff758df9400eecfc5R97 It seems to me the current implementation of startsWith is broken, as it considers gml:name[2] and gml:name[3] paths to be the same, while they may be mapped to different tables (e.g. see SimpleAttributeFeatureChainWfsTest) Any feedback is higly appreciated :-) Thanks! -- Best regards, Stefano Costa == GeoServer Professional Services from the experts! Visithttp://goo.gl/it488V for more information. == Dott. Stefano Costa Senior Software Engineer GeoSolutions S.A.S.Via di Montramito 3/A 55054 Massarosa (LU) Italy phone: +39 0584 962313 fax: +39 0584 1660272 http://www.geo-solutions.ithttp://twitter.com/geosolutions_it ------------------------------------------------------- AVVERTENZE AI SENSI DEL D.Lgs. 196/2003 Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003. The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.
------------------------------------------------------------------------------
_______________________________________________ GeoTools-Devel mailing list GeoTools-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geotools-devel