[ 
https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890173#comment-16890173
 ] 

Ignacio Vera edited comment on LUCENE-8928 at 7/22/19 6:11 PM:
---------------------------------------------------------------

I run this approach locally. It helps as well in the case of Geo3D (3 
dimensions case) quite a bit. I tried different approaches to try to make 
indexation faster but so far no luck:

 
||Approach||Index time (sec)||Index time (sec)|| ||Force merge time 
(sec)||Force merge time (sec)|| ||Index size (GB)||Index size (GB)|| ||Reader 
heap (MB)||Reader heap (MB)||
|| ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff||
|points|181.1s|124.4s|46%|76.9s|53.5s|44%|0.55|0.55|-0%|1.57|1.57|0%|
|shapes|327.4s|215.4s|52%|168.9s|120.2s|40%|1.28|1.29|-1%|1.62|1.61|0%|
|geo3d|211.9s|154.7s|37%|94.3s|66.4s|42%|0.75|0.75|-0%|1.58|1.58|0%|

 


 
||Approach||Shape||M hits/sec||M hits/sec||     ||QPS  ||QPS ||           ||Hit 
count  ||Hit count    || 
 ||      ||          ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff||
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|polyRussia|20.07|20.46|-2%|5.72|5.83|-2%|3508846|3508846| 0%|
|points|poly 10|88.64|87.56| 1%|56.05|55.37| 1%|355809475|355809475| 0%|
|points|polyMedium|10.47|10.54|-1%|128.26|129.15|-1%|2693559|2693559| 0%|
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|distance|93.48|95.96|-3%|54.92|56.38|-3%|382961957|382961957| 0%|
|points|nearest 10|0.10|0.09|11%|9687.24|8755.72|11%|60844404|60844404| 0%|
|points|sort|43.12|43.04| 0%|43.88|43.80| 0%|221118844|221118844| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|shapes|polyRussia|11.57|9.85|17%|3.30|2.81|17%|3508846|3508846| 0%|
|shapes|poly 10|54.98|47.08|17%|34.77|29.77|17%|355809475|355809475| 0%|
|shapes|polyMedium|5.31|4.52|17%|65.01|55.39|17%|2693559|2693559| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|polyRussia|0.95|0.90| 5%|0.27|0.26| 5%|3508671|3508671| 0%|
|geo3d|poly 10|77.26|57.16|35%|48.85|36.14|35%|355855227|355855227| 0%|
|geo3d|polyMedium|0.95|0.69|37%|11.62|8.50|37%|2693545|2693545| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|distance|95.35|76.17|25%|55.96|44.70|25%|383371884|383371884| 0%|

 


was (Author: ivera):
I run this approach locally. It helps as well in the case of Geo3D (3 
dimensions case) quite a bit. I tried different approaches to try to make 
indexation faster but so far no luck:

 
||Approach||Index time (sec)||Index time (sec)||Force merge time (sec)||Force 
merge time (sec)||Index size (GB)||Index size (GB)||Reader heap (MB)||Reader 
heap (MB)||
|| ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff||
|points|181.1s|124.4s|46%|76.9s|53.5s|44%|0.55|0.55|-0%|1.57|1.57|0%|
|shapes|327.4s|215.4s|52%|168.9s|120.2s|40%|1.28|1.29|-1%|1.62|1.61|0%|
|geo3d|211.9s|154.7s|37%|94.3s|66.4s|42%|0.75|0.75|-0%|1.58|1.58|0%|

 


 
||Approach||Shape||M hits/sec||M hits/sec||     ||QPS  ||QPS ||           ||Hit 
count  ||Hit count    || 
 ||      ||          ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff||
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|polyRussia|20.07|20.46|-2%|5.72|5.83|-2%|3508846|3508846| 0%|
|points|poly 10|88.64|87.56| 1%|56.05|55.37| 1%|355809475|355809475| 0%|
|points|polyMedium|10.47|10.54|-1%|128.26|129.15|-1%|2693559|2693559| 0%|
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|distance|93.48|95.96|-3%|54.92|56.38|-3%|382961957|382961957| 0%|
|points|nearest 10|0.10|0.09|11%|9687.24|8755.72|11%|60844404|60844404| 0%|
|points|sort|43.12|43.04| 0%|43.88|43.80| 0%|221118844|221118844| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|shapes|polyRussia|11.57|9.85|17%|3.30|2.81|17%|3508846|3508846| 0%|
|shapes|poly 10|54.98|47.08|17%|34.77|29.77|17%|355809475|355809475| 0%|
|shapes|polyMedium|5.31|4.52|17%|65.01|55.39|17%|2693559|2693559| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|polyRussia|0.95|0.90| 5%|0.27|0.26| 5%|3508671|3508671| 0%|
|geo3d|poly 10|77.26|57.16|35%|48.85|36.14|35%|355855227|355855227| 0%|
|geo3d|polyMedium|0.95|0.69|37%|11.62|8.50|37%|2693545|2693545| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|distance|95.35|76.17|25%|55.96|44.70|25%|383371884|383371884| 0%|

 

> BKDWriter could make splitting decisions based on the actual range of values
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-8928
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8928
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on 
> values in other dimensions. While this may be ok for geo points, this is 
> usually not true for ranges (or geo shapes, which are ranges too). Maybe we 
> could get better indexing by re-computing the range of values on each 
> dimension before making the choice of the split dimension?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to