[ 
https://issues.apache.org/jira/browse/LUCENE-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575338#comment-16575338
 ] 

Nicholas Knize commented on LUCENE-8452:
----------------------------------------

bq. the dimensions are not independent.

That's what I mean by "...due to the dimensional expansion issue (especially 
with point shapes)." For point shapes, {{LatLonShape}} does the opposite of 
Principal Component Analysis. Instead of reducing dimensionality it expands it 
from two to six and subsequently perfectly correlates dimensions 1, 3, 5 and 2, 
4, 6. It is not at all optimal and not at all expected to perform anywhere 
close to a structure optimized for true two dimension point space. In fact it 
goes against every concept of a optimal structure.

bq. to see how the BKD tree behaves for different dimensions

As I mentioned, the split/merge algorithms for BKD are naive for ease of 
implementation. Given the lack of heuristics along dimensions I'd expect 
QPS/HPS to drop as more dimensions are added.  Add a high level of correlation 
between dimensions and I'd expect it to get even worse. Certainly, for high 
dimension indexing, work will need to be done to optimize the BKD structure. 
Alternatively, we can work on a separate codec (possibly a R*/X/PR-tree hybrid) 
derived from the current BKD implementation that is designed for higher 
dimension space - since the kd structure itself was not designed for such 
applications. And subsequently restrict BKD to two dimensions only. 

> BKD-based shape indexing benchmarks
> -----------------------------------
>
>                 Key: LUCENE-8452
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8452
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/sandbox
>            Reporter: Ignacio Vera
>            Priority: Major
>         Attachments: BKDperf.pdf
>
>
> Initial benchmarking of the new BKD-based shape indexing suggest that 
> searches can be somewhat under-performing.   I open this ticket to share the 
> findings and to open a discussion how to speed up the solution.
>  
> The first benchmark is done by using the current benchmark in luceneutils for 
> indexing points and search by bounding box. We would expect {{LatLonShape}} 
> to be slower that {{LatLonPoint}} but still having a good performance. The 
> results of running such benchmark in my computer looks like:
>  
> LatLonPoint:
> 89.717239531 sec to index
> INDEX SIZE: 0.5087761553004384 GB
> READER MB: 0.6098232269287109
> maxDoc=60844404
> totHits=221118844
> BEST M hits/sec: 72.91056132596746
> BEST QPS: 74.19031323419311 
>  
> LatLonShape:
> 89.388678805 sec to index
> INDEX SIZE: 1.3028179928660393 GB
> READER MB: 0.8827085494995117
> maxDoc=60844404
> totHits=221118844
> BEST M hits/sec: 1.0053836784184809
> BEST QPS: 1.0230305276205143
>  
> A second benchmark has been performed indexing around 10 million 4-side 
> polygons and around 3 million points. Searches are performed using bounding 
> boxes. The results are compared with spatial trees alternatives. Spatial 
> trees use a composite strategy, precision=0.001 degrees and distErrPct=0.25:
>  
> s2 (Geo3d):
> 1191.732124301 sec to index part 0
> INDEX SIZE: 3.2086284114047885 GB
> READER MB: 19.453557014465332
> maxDoc=12949519
> totHits=705758537
> BEST M hits/sec: 13.311369588840462
> BEST QPS: 4.243743434150063
>  
> quad (JTS):
> 3252.62925159 sec to index part 0
> INDEX SIZE: 4.5238002222031355 GB
> READER MB: 41.15725612640381
> maxDoc=12949519
> totHits=705758357
> BEST M hits/sec: 35.54591930673003
> BEST QPS: 11.332252412866938
>  
> LatLonShape:
> 30.32712009 sec to index part 0
> INDEX SIZE: 0.5627057952806354 GB
> READER MB: 0.29498958587646484
> maxDoc=12949519
> totHits=705758228
> BEST M hits/sec: 3.4130465326433357
> BEST QPS: 1.0880999177593018
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to