[ 
https://issues.apache.org/jira/browse/LUCENE-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336024#comment-16336024
 ] 

Nicholas Knize commented on LUCENE-7110:
----------------------------------------

{quote}If I understand correctly, leaf nodes would store the entire 
shape.{quote}

Not exactly. For shapes I still think using ranges with doc value shapes, or 
(as suggested) approximating w/ raster cells is the right way to go. These 
shapes can get awfully large and storing them as the leaf nodes I think is a 
non-starter. Maybe I should back up a bit. I'm actually just proposing either a 
new codec (or inheriting from/extending the existing Points codec) that's 
designed and optimized to index and search ranges/rectangles. I don't want, nor 
I think its worth it, to get hung up on the geo use-case or complex polygon 
shapes at the moment. Right now we have the BKD structure. Which is designed 
for point data only. I then created a franken-encoding for Range fields that 
basically tricks BKD into thinking dimensional ranges are points. It works, but 
we can do better on the tree organization for ranges by considering other 
criteria during split. This is really all I am and should be proposing at the 
moment. We can save the shape indexing and geo use cases for a later 
discussion. Right now I think its worthwhile sandboxing an R-Tree based codec 
or modification to Points. Once that's done, I'll provide some benchmarks to 
compare ranges encoded using the Points structure vs. encoded using the R-Tree 
structure. Thoughts?

> Add Shape Support to BKD (extend to an R*/X-Tree data structure)
> ----------------------------------------------------------------
>
>                 Key: LUCENE-7110
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7110
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Nicholas Knize
>            Priority: Major
>
> I've been tinkering with this off and on for a while and its showing some 
> promise so I'm going to open an issue to (eventually) add this feature to a 
> future release.
> R*/X-Tree is a data structure designed to support Shapes (2D, 3D, nD) where, 
> like the internal node, the key for each leaf node is the Minimum Bounding 
> Range (MBR - sometimes "incorrectly" referred to as Minimum Bounding 
> Rectangle) of the shape. Inserting a shape then boils down to the best way of 
> optimizing the tree structure. This optimization is driven by a set of 
> criteria for choosing the appropriate internal key (e.g., minimizing overlap 
> between siblings, maximizing "squareness", minimizing area, maximizing space 
> usage). Query is then (a bit oversimplified) a two-phase process:
>  * recurse each branch that overlaps with the MBR of the query shape
>  * compute the relation with the leaf node(s) - in higher dimensions (3+) 
> this becomes an increasingly difficult computational geometry problem
> The current BKD implementation is a special simplified case of an R*/X tree 
> where, for Point data, it is always guaranteed there will never be overlap 
> between sibling nodes (because you're using the point data as the keys). By 
> exploiting this property the tree algorithms (split, merge, etc) are 
> relatively cheap (hence their performance boost over postings based 
> numerics). By modifying the key data, and extending the tree generation 
> algorithms BKD logic can be extended to support Shape data using the MBR as 
> the Key and modifying split and merge based on the criteria needed for 
> optimizing a shape-based data structure.
> The initial implementation (based on limitations of the GeoAPI) will support 
> 2D shapes only. Once the GeoAPI can performantly handle 3D shapes the change 
> is relatively trivial to add the third dimension to the tree generation code.
> Like everything else, this feature will be created in sandbox and, once 
> mature, will graduate (likely to lucene-spatial or lucene-spatial-extras 
> depending on the library needs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to