[ 
https://issues.apache.org/jira/browse/LUCENE-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900563#comment-13900563
 ] 

David Smiley commented on LUCENE-4942:
--------------------------------------

Somewhat related to this is my newfound realization that indexed non-point 
shapes will result in IntersectsPrefixTreeFilter (technically it's actually 
VisitorTemplate) scanning over these smallest grid cells / terms *twice* and 
thus calculate intersection *twice* -- once with the leaf flag, once without.  
This is likely a major performance bug.  It would be awkward to fix that right 
now, but it would be easy once there simply wasn't this redundant indexing of 
terms -- hence this issue.

> Indexed non-point shapes index excessive terms
> ----------------------------------------------
>
>                 Key: LUCENE-4942
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4942
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>            Reporter: David Smiley
>
> Indexed non-point shapes are comprised of a set of terms that represent grid 
> cells.  Cells completely within the shape or cells on the intersecting edge 
> that are at the maximum detail depth being indexed for the shape are denoted 
> as "leaf" cells.  Such cells have a trailing '\+' at the end.  _Such tokens 
> are actually indexed twice_, one with the leaf byte and one without.
> The TermQuery based PrefixTree Strategy doesn't consider the notion of 'leaf' 
> cells and so the tokens with '+' are completely redundant.
> The Recursive [algorithm] based PrefixTree Strategy better supports correct 
> search of indexed non-point shapes than TermQuery does and the distinction is 
> relevant.  However, the foundational search algorithms used by this strategy 
> (Intersects & Contains; the other 2 are based on these) could each be 
> upgraded to deal with this correctly.  Not trivial but very doable.
> In the end, spatial non-point indexes can probably be trimmed my ~40% by 
> doing this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to