I have a proposal and some proof-of-concept code for speeding up queries with
filters that contain chained-attributes. Before I spend more time on the
code, I wanted to float the idea on the dev list to see what the plugin
authors think. If the idea won't work, I'd rather find out now than after I
put in a few days to flesh out the prototype.
First, let me lay out my understanding of how the app-schema plugin does
filtering on chained-attributes to make sure I understand the code. The
first few steps below are for manual construction of a filter and iterating
over the results. Obviously, this would be a bit different when integrated
into a full workflow like processing a WFS request. Please correct me if I
misunderstand how the plugin works at all.
1) A filter is created that involves a chained-feature attribute. Taking the
schemas used in the FeatureChainingTest, this could be something like:
Expression property =
ff.property("gsml:specification/gsml:GeologicUnit/gml:description");
Filter filter = ff.like(property, "Olivine basalt");
2) A call is made to MappingFeatureSource.getFeatures(filter). This doesn't
actually query the underlying datasources. It just returns a
MappingFeatureCollection which contains enough info to retrieve actual
features if it is iterated over.
3) MappingFeatureCollection.features is called to get an iterator for the
collection. This calls into MappingFeatureIteratorFactory.getInstance which
calls AppSchemaDataAccess.unrollQuery and then down to
AppSchemaDataAccess.unrollFilter.
4) AppSchemaDataAccess.unrollFilter creates an UnmappingFilterVisitor to
figure out all the possible underlying datasources that must be queried to
handle the filter. (While I haven't traced it entirely, I believe this is
to handle polymorphic mappings or it is for handling xpaths with multiple
resolutions. Any input here?) In any case, if UnmappingFilterVisitor
determines that any part of the filter deals with a chained-feature, it
wraps up the returned filter in a MultiValuedOrImpl.
5) Popping back up to MappingFeatureIteratorFactory, if the returned filter
is a MultiValuedOrImpl, it returns a FilteringMappingFeatureIterator to the
MappingFeatureCollection which invoked it.
6) FilteringMappingFeatureIterator is where things start to slow down.
Rather than attempting to pass the filter (or parts of it) to the underlying
datastores, the FilteringMappingFeatureIterator will instead iterate over
*every* feature from the primary FeatureType defined in the app-schema
mappings file. It will resolve each feature one-by-one (e.g. loading it and
all of its sub-features from DB tables) and then pass the feature to the
unrolled filter's evaluate(feature) method. If the feature matches the
filter, it will be returned by the iterator's next() method. If not, the
iterator repeats the process for the next feature in the datasource.
This means that if the datasource (assume it's a DB) holds 10000 features
and you pass a complex filter, you will make at least 20000 calls to the DB
- 10000 for the primary feature and another 10000 for each chained feature.
This is the case even if the iterator ends up only returning a few features
in the end due to the rest being excluded by the filter. For the schema we
have (a simplified version of which was included in the tests for for the
NestedAttributeExpression bug I submitted), response times become
unacceptable once you have more than a few hundred features.
So here is a simplified version of what my proof-of-concept code does:
1) FilteringMappingFeatureIterator gets the original filter rather than the
unrolled one.
2) Within its initialiseSourceFeatures method, it uses some of the new
Filters utilities Jody and I have been working on to determine if the
primary filter is an and/or. If so, it will perform the following steps for
each child and intersect or union the results as appropriate.
3) Once we have a non-logic Filter (e.g. equals, like, intersects, etc), it
uses another new method I added to Filters to extract the property string
(referred to as attPath, below). (Side note - I'll submit the new
Filters.findPropertyName(Filter f) method when I get a chance, or I can just
email it to Jody.)
4) It then does something like this (simplified for clarity):
//get all the steps in the path
XPath.StepList steps = XPath.steps(mapping.getTargetFeature(),
attPath, namespaces);
//get all the mappings sources which match the first sub-level
attribute
List<NestedAttributeMapping> attMappings =
mapping.getAttributeMappingsIgnoreIndex(steps.subList(0, 1));
Filter f = ...build a new filter that matches the old one except it
swaps in the trimmed down attribute path. I think it always needs to remove
the first two parts of the path - one for the property name in the root
feature and the next for the sub-feature name. The remaining path should
then always start with the property name in the sub-feature. That is,
unless some funky XPath is being used. Comments? ...
Set ids = new LinkedHashSet();
for (NestedAttributeMapping attMapping : mappings) {
List<Features> featureList = attMapping.findParentFeatures(f);
Set curIds = ...pull out the ids to the parent features from the
returned list...
ids.addAll(curIds);
}
5) The new method NestedAttributeMapping.findParentFeatures(Filter f) is
very similar to the existing NestedAttributeMapping.getFeatures() method,
except rather than returning all sub-features that match a foreign key
reference to a single parent feature, it return all features that match the
passed in filter. It also limits the returned properties in the query to
only the property which is the foreign key to the parent feature so that it
doesn't pull excess data that will only be tossed out once we've gotten the
parent ids.
6) As noted in step #2, if the original filter was an and/or, the ids
returned from each sub-filter are intersected or unioned together as
appropriate.
7) The an iterator over the final set of ids is created and stored.
initialiseSourceFeatures is then complete.
8) FilteringMappingFeatureIterator.hasNext() now simply calls
idIterator.hasNext()
9) FilteringMappingFeatureIterator.next() now calls idIterator.next() and
then retrieves the full feature from the MappingFeatureSource using a simple
id query.
I have left out some parts, like handling sub-filters that query simple
attributes of the primary feature rather than chained-attributes, but you
get the general idea. Also, while I haven't verified it yet, I believe this
approach will work recursively so that multi-step feature chains can be
queried in the same way. The ids for the parents of the lowest level
contained feature will be returned and these mid-level features will, in
turn, be pulled to then retrieve the ids for the next level up, etc.
Similarly, multi-level logic filters (e.g. or(and(f1,f2), f3)) should work
as well.
What I'm most interested in is if anyone see major problems with this
approach, at least for the primary use case where direct paths to
chained-attributes are used in a filter? I'm not sure what would happen if
more complex XPath notation is used. In that situation, the current
behavior of FilteringMappingFeatureIterator would likely still be required.
But even so, if the MappingFeatureIteratorFactory can determine when a
filter matches the "direct path" use case and uses this modified
FilteringMappingFeatureIterator, then the app-schema plugin could actually
be used against very large datasets. Unless my whole assumption that people
want to use such filters is wrong... I don't think we're the only ones who
would want to do this.
Please let me know what you think.
Thanks,
e
--
View this message in context:
http://osgeo-org.1803224.n2.nabble.com/Speeding-up-feature-chaining-filters-in-app-schema-plugin-tp6311949p6311949.html
Sent from the geotools-devel mailing list archive at Nabble.com.
------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel