[
https://issues.apache.org/jira/browse/LUCENE-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated LUCENE-4869:
---------------------------------
Description:
LUCENE-4644 implemented the "IsWithin" predicate for a RecursivePrefixTree
based field. It's slow since it looks across the whole world to ensure it
doesn't match docs with data anywhere outside the query shape. It can be
configured to only look outside the query shape using a very small buffer
distance, and that will filter out documents spanning the query shape boundary,
but not indexed shapes comprised of multiple disjoint parts. The solution
proposed here is to index a point per disjoint part in such a way that it can
be easily retrieved (e.g. DocValues) and then a post-process of
WithinPrefixTreeFilter would remove false-positives.
This isn't particularly hard/advanced but it requires some advances in some
APIs that aren't quite there yet. Spatial4j's ShapeCollection (aka WKT
GeometryCollection or Multi*) needs to get released, it needs a vertex
iterator. There needs to be code to read and write a set of points to a
BinaryDocValues field (1/doc). And finally of course WithinPrefixTreeFilter
needs to have a mode in which it uses the smallest buffer and then in the end
checks the DocValues to remove false-postivies.
was:
LUCENE-4644 adds a useful initial capability to implement the "Within"
predicate for a RecursivePrefixTree based field. But it will match
false-positives for indexed shapes comprised of multiple disjoint parts. The
solution to be worked out here is to index a point per disjoint part in such a
way that it can be easily retrieved (e.g. DocValues) and then a post-process to
WithinPrefixTreeFilter would remove false-positives.
I didn't call this a 'bug' because this addresses a known temporary limitation,
and Within is still useful despite this.
Summary: Optimize IsWithin spatial RPT to use a point cache for
false-positve removal (was: Fix the Within spatial predicate PrefixTree to
remove false-positives)
> Optimize IsWithin spatial RPT to use a point cache for false-positve removal
> ----------------------------------------------------------------------------
>
> Key: LUCENE-4869
> URL: https://issues.apache.org/jira/browse/LUCENE-4869
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/spatial
> Reporter: David Smiley
>
> LUCENE-4644 implemented the "IsWithin" predicate for a RecursivePrefixTree
> based field. It's slow since it looks across the whole world to ensure it
> doesn't match docs with data anywhere outside the query shape. It can be
> configured to only look outside the query shape using a very small buffer
> distance, and that will filter out documents spanning the query shape
> boundary, but not indexed shapes comprised of multiple disjoint parts. The
> solution proposed here is to index a point per disjoint part in such a way
> that it can be easily retrieved (e.g. DocValues) and then a post-process of
> WithinPrefixTreeFilter would remove false-positives.
> This isn't particularly hard/advanced but it requires some advances in some
> APIs that aren't quite there yet. Spatial4j's ShapeCollection (aka WKT
> GeometryCollection or Multi*) needs to get released, it needs a vertex
> iterator. There needs to be code to read and write a set of points to a
> BinaryDocValues field (1/doc). And finally of course WithinPrefixTreeFilter
> needs to have a mode in which it uses the smallest buffer and then in the end
> checks the DocValues to remove false-postivies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]