[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

Ignacio Vera (Jira) Tue, 24 Sep 2019 00:42:53 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936519#comment-16936519
 ]


Ignacio Vera commented on LUCENE-8928:
--------------------------------------

I have played a bit more with this idea and I wondered if we need to compute 
exact bounds for every split. I modified [~jpountz] patch so instead of 
computing the bounds for every split, it computes every N splits. This is 
controlled by a static property called {{SPLITS_BEFORE_EXACT_BOUNDS}}.

The patch can be found here: 
https://github.com/iverase/lucene-solr/commit/e63f8c73a86c46ec406143fcd0cb31a8371dfe63

My test show that setting this value to 4 (compute exact bounds every 4 splits) 
reduces the indexing overhead to around 10% and keeps almost the same 
performance as the previous approach. Maybe we can find a better heuristic to 
set such value.

In addition, this patch does not apply for dimension <= 2 and the split 
algorithm is reverted to the original one.

 

> BKDWriter could make splitting decisions based on the actual range of values
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-8928
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8928
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on 
> values in other dimensions. While this may be ok for geo points, this is 
> usually not true for ranges (or geo shapes, which are ranges too). Maybe we 
> could get better indexing by re-computing the range of values on each 
> dimension before making the choice of the split dimension?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

Reply via email to