[ 
https://issues.apache.org/jira/browse/LUCENE-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336283#comment-16336283
 ] 

Nicholas Knize commented on LUCENE-7110:
----------------------------------------

Some really good ideas here. Thank you all for the feedback.

{quote}Then shouldn't we just add Range? It'd like to see it generic like 
points, e.g. it would support different dimensions (1D, 2D) ranges etc. I think 
it should be purely byte based and not bake in stuff like coordinate systems or 
any of that: the Query side can handle that, just like the Point queries handle 
it for their cases (box, radius, etc).{quote}

This is exactly what I was thinking. Coordinate system agnostic and optimized 
for ranges, boxes, cubes, etc. I'm really looking to see what everyone thinks 
about it being its own companion codec to Points vs. extending Points. I like 
the cleanliness and safety of it being its own thing and not mucking around 
with or conflating Points since it already works well for its intended use 
case. But I didn't want to head down that path if there are reasonable 
performance or maintenance concerns.

{quote}I'm curious if you know about algorithms to build balanced R-trees in an 
offline fashion?{quote}

Oh, I like where you're going with this. There are quite a few actually. Most 
of the offline R-Tree builders are designed for the case where all of the data 
is known in advance. I think it would be super efficient to build the 
sub-optimal random access R-Trees on a per segment basis (since data is rarely 
provided at once) and in the segment merge phase use the offline approach to 
optimize the tree. Is that along the lines of what you were thinking?

{quote} So the objective is better performance for indexed rectangles over what 
LatLonBoundingBox is able to do today...{quote}

Kind of. The objective is really better performance for all indexed ranges; not 
just {{LatLonBoundingBox}} but sibling types as well ({{DoubleRange}} 
{{FloatRange}} etc.)

{quote}Do you think the performance is bad enough to warrant the maintenance & 
complexities in what you describe?{quote}

I'd be lying if I said I had numbers to confirm and back up "bad enough". So 
I'm honestly not entirely sure until I get into some benchmarking. I do know 
there are evil cases (as always) where we could really optimize the tree to 
alleviate some of the linear scan issues we occasionally see today. Another big 
part of this is to really create the right tree/codec for the job; Points for 
Points and Range for Ranges. There will be a LOT of similarities between the 
two (since they're both height balanced trees) but it does open us up to really 
tweaking the tree construction algorithm (split/merge) around better data 
heuristics.


> Add Shape Support to BKD (extend to an R*/X-Tree data structure)
> ----------------------------------------------------------------
>
>                 Key: LUCENE-7110
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7110
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Nicholas Knize
>            Priority: Major
>
> I've been tinkering with this off and on for a while and its showing some 
> promise so I'm going to open an issue to (eventually) add this feature to a 
> future release.
> R*/X-Tree is a data structure designed to support Shapes (2D, 3D, nD) where, 
> like the internal node, the key for each leaf node is the Minimum Bounding 
> Range (MBR - sometimes "incorrectly" referred to as Minimum Bounding 
> Rectangle) of the shape. Inserting a shape then boils down to the best way of 
> optimizing the tree structure. This optimization is driven by a set of 
> criteria for choosing the appropriate internal key (e.g., minimizing overlap 
> between siblings, maximizing "squareness", minimizing area, maximizing space 
> usage). Query is then (a bit oversimplified) a two-phase process:
>  * recurse each branch that overlaps with the MBR of the query shape
>  * compute the relation with the leaf node(s) - in higher dimensions (3+) 
> this becomes an increasingly difficult computational geometry problem
> The current BKD implementation is a special simplified case of an R*/X tree 
> where, for Point data, it is always guaranteed there will never be overlap 
> between sibling nodes (because you're using the point data as the keys). By 
> exploiting this property the tree algorithms (split, merge, etc) are 
> relatively cheap (hence their performance boost over postings based 
> numerics). By modifying the key data, and extending the tree generation 
> algorithms BKD logic can be extended to support Shape data using the MBR as 
> the Key and modifying split and merge based on the criteria needed for 
> optimizing a shape-based data structure.
> The initial implementation (based on limitations of the GeoAPI) will support 
> 2D shapes only. Once the GeoAPI can performantly handle 3D shapes the change 
> is relatively trivial to add the third dimension to the tree generation code.
> Like everything else, this feature will be created in sandbox and, once 
> mature, will graduate (likely to lucene-spatial or lucene-spatial-extras 
> depending on the library needs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to