David Smiley created LUCENE-4942:
------------------------------------
Summary: Indexed non-point shapes index excessive terms
Key: LUCENE-4942
URL: https://issues.apache.org/jira/browse/LUCENE-4942
Project: Lucene - Core
Issue Type: Improvement
Components: modules/spatial
Reporter: David Smiley
Indexed non-point shapes are comprised of a set of terms that represent grid
cells. Cells completely within the shape or cells on the intersecting edge
that are at the maximum detail depth being indexed for the shape are denoted as
"leaf" cells. Such cells have a trailing '\+' at the end. _Such tokens are
actually indexed twice_, one with the leaf byte and one without.
The TermQuery based PrefixTree Strategy doesn't consider the notion of 'leaf'
cells and so the tokens with '+' are completely redundant.
The Recursive [algorithm] based PrefixTree Strategy better supports correct
search of indexed non-point shapes than TermQuery does and the distinction is
relevant. However, the foundational search algorithms used by this strategy
(Intersects & Contains; the other 2 are based on these) could each be upgraded
to deal with this correctly. Not trivial but very doable.
In the end, spatial non-point indexes can probably be trimmed my ~40% by doing
this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]