[
https://issues.apache.org/jira/browse/LUCENE-8888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ignacio Vera resolved LUCENE-8888.
----------------------------------
Resolution: Fixed
Assignee: Ignacio Vera
Fix Version/s: 8.2
master (9.0)
> Improve distribution of points with data dimension in BKD tree leaves
> ---------------------------------------------------------------------
>
> Key: LUCENE-8888
> URL: https://issues.apache.org/jira/browse/LUCENE-8888
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ignacio Vera
> Assignee: Ignacio Vera
> Priority: Major
> Fix For: master (9.0), 8.2
>
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> In LUCENE-8688 it was introduce a new storing strategy for leaves contains
> duplicated points. This works well with indexed dimension as the process of
> partition the space and the final sorting of leaves groups points with equal
> indexed dimensions.
> This is not the case all the time if the point contain data dimensions. It
> might happen that if two points have the same indexed dimensions but
> different data dimensions, the distribution on the leaves is not the most
> optimal.
> A good example is if a user tries to index a bounding box using LatLonShape.
> The resulting tessellation of a bounding box is two triangles with the same
> indexed dimensions but different data dimensions. If there are two documents
> indexing the same bounding box, the result in the leaf is the triangles from
> one document followed by the triangles of the second document. This is
> because the current sorting/selection algorithms use one indexed dimension
> and tie-break on the
> docID.
> The most optimal distribution in the case above is two group together the
> equal triangles. Therefore what it is propose here is to update the
> selection/ sorting algorithms to use the data dimensions when they exist as
> tie-breakers before using the docID.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]