[ 
https://issues.apache.org/jira/browse/LUCENE-8888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8888.
----------------------------------
       Resolution: Fixed
         Assignee: Ignacio Vera
    Fix Version/s: 8.2
                   master (9.0)

> Improve distribution of points with data dimension in BKD tree leaves
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-8888
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8888
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Assignee: Ignacio Vera
>            Priority: Major
>             Fix For: master (9.0), 8.2
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In LUCENE-8688 it was introduce a new storing strategy for leaves contains 
> duplicated points. This works well with indexed dimension as the process of 
> partition the space and the final sorting of leaves groups points with equal 
> indexed dimensions.
> This is not the case all the time if the point contain data dimensions. It 
> might happen that if two points have the same indexed dimensions but 
> different data dimensions, the distribution on the leaves is not the most 
> optimal.
> A good example is if a user tries to index a bounding box using LatLonShape. 
> The resulting tessellation of a bounding box is two triangles with the same 
> indexed dimensions but different data dimensions. If there are two documents 
> indexing the same bounding box, the result in the leaf is the triangles from 
> one document followed by the triangles of the second document. This is  
> because the current sorting/selection algorithms  use one indexed dimension 
> and tie-break on the 
> docID.
> The most optimal distribution in the case above is two group together the 
> equal triangles. Therefore what it is propose here is to update the 
> selection/ sorting algorithms to use the data dimensions when they exist as 
> tie-breakers before using the docID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to