[
https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938429#comment-16938429
]
Ignacio Vera commented on LUCENE-8928:
--------------------------------------
Run some benchmarks by comparing this new approach with the previous approach
shown a similar query performance but a much faster indexing rate:
||Approach||Index time (sec)||Index time (sec)|| ||Force merge time
(sec)||Force merge time (sec)|| ||Index size (GB)||Index size (GB)|| ||Reader
heap (MB)||Reader heap (MB)||
|| ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff||
|geo3d|163.5s|218.4s|-25%|0.0s|0.0s| 0%|0.71|0.71|-0%|1.75|1.75|-0%|
|shapes|227.8s|319.6s|-29%|0.0s|0.0s| 0%|1.27|1.27| 0%|1.78|1.78| 0%|
||Approach||Shape||M hits/sec||M hits/sec|| ||QPS ||QPS || ||Hit
count ||Hit count ||
|| || ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff||
|geo3d|box|55.58|57.53|-3%|56.56|58.54|-3%|221118844|221118844| 0%|
|geo3d|polyRussia|0.56|0.56|-1%|0.16|0.16|-1%|3508671|3508671| 0%|
|geo3d|poly 10|48.87|51.25|-5%|30.90|32.41|-5%|355855227|355855227| 0%|
|geo3d|polyMedium|0.62|0.63|-1%|7.64|7.67|-1%|2693545|2693545| 0%|
|geo3d|distance|68.16|69.70|-2%|40.00|40.91|-2%|383371884|383371884| 0%|
|shapes|box|45.99|46.52|-1%|46.80|47.34|-1%|221118844|221118844| 0%|
|shapes|polyRussia|6.64|7.01|-5%|1.89|2.00|-5%|3508846|3508846| 0%|
|shapes|poly 10|33.40|34.69|-4%|21.12|21.93|-4%|355809475|355809475| 0%|
|shapes|polyMedium|3.07|3.30|-7%|37.62|40.43|-7%|2693559|2693559| 0%|
> BKDWriter could make splitting decisions based on the actual range of values
> ----------------------------------------------------------------------------
>
> Key: LUCENE-8928
> URL: https://issues.apache.org/jira/browse/LUCENE-8928
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on
> values in other dimensions. While this may be ok for geo points, this is
> usually not true for ranges (or geo shapes, which are ranges too). Maybe we
> could get better indexing by re-computing the range of values on each
> dimension before making the choice of the split dimension?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]