[jira] [Commented] (LUCENE-4942) Indexed non-point shapes index excessive terms

David Smiley (JIRA) Fri, 06 Mar 2015 09:10:36 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350585#comment-14350585
 ]


David Smiley commented on LUCENE-4942:
--------------------------------------

I'd like to solve this issue (excessive terms) and _also_ address 
differentiating between fully-contained leaves vs approximated leaves (for 
LUCENE-5776) in one go tracked on this issue to avoid dealing with back-compat 
more than once.  That is, just once we change how PrefixTree derivative 
strategies encode the term data, instead of doing over more than one issue.  
And I'm thinking on trunk wouldn't worry about the back-compat (it is trunk 
after all), and then the port to 5x would have to consider it -- the down-side 
being some spatial code on trunk vs 5x may vary a bit.  Perhaps the back-compat 
detection in 5x would work via a check for Version similar to Analyzer's having 
a version property that can optionally be set.

I'm not sure how contentious it may be to simply forgo back-compat.  _Just_ 
re-index.  And you're not affected if all you have is point data, which seems 
to be at least 80% of the users using spatial.

> Indexed non-point shapes index excessive terms
> ----------------------------------------------
>
>                 Key: LUCENE-4942
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4942
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>
> Indexed non-point shapes are comprised of a set of terms that represent grid 
> cells.  Cells completely within the shape or cells on the intersecting edge 
> that are at the maximum detail depth being indexed for the shape are denoted 
> as "leaf" cells.  Such cells have a trailing '\+' at the end.  _Such tokens 
> are actually indexed twice_, one with the leaf byte and one without.
> The TermQuery based PrefixTree Strategy doesn't consider the notion of 'leaf' 
> cells and so the tokens with '+' are completely redundant.
> The Recursive [algorithm] based PrefixTree Strategy better supports correct 
> search of indexed non-point shapes than TermQuery does and the distinction is 
> relevant.  However, the foundational search algorithms used by this strategy 
> (Intersects & Contains; the other 2 are based on these) could each be 
> upgraded to deal with this correctly.  Not trivial but very doable.
> In the end, spatial non-point indexes can probably be trimmed my ~40% by 
> doing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4942) Indexed non-point shapes index excessive terms

Reply via email to