[ 
https://issues.apache.org/jira/browse/LUCENE-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256335#comment-15256335
 ] 

Robert Muir commented on LUCENE-7251:
-------------------------------------

I investigated the russia poly just out of curiousity, and made it simpler by 
removing a lot of "hard stuff from it", just to see the overhead of having e.g. 
1k components, holes, crossing dateline, etc:
* Removed all islands and reduced to the one 4.4k "big part of russia" left of 
dateline: 24.75 MHPS/7.2 QPS
* Same as above but with its 3 holes removed: 25.88 MHPS/7.5 QPS

So the current 20.5 MHPS/5.84 QPS is reasonable IMO as far as what this patch 
tries to solve for "many components".  

You can inspect this monster visually here: 
http://www.geonames.org/2017370/russian-federation.html


> remove LatLonGrid
> -----------------
>
>                 Key: LUCENE-7251
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7251
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-7251.patch, LUCENE-7251.patch
>
>
> This crutch doesn't speed up most polygons anymore, only some very complex 
> ones with many components/holes.
> Instead as a simple step, we can use a tree of components (organized by 
> bounding box x-intervals just like edges). This makes things less trappy for 
> crazy polygons like the russia one.
> Synthetic polygons from luceneUtil
> ||vertices||old QPS||new QPS|
> |5|40.5|43.8|
> |50|33.1|32.8|
> |500|31.9|31.9|
> |5000|29.4|29.6|
> |50000|20.4|22.8|
> |500000|4.0|6.9|
> Real polygons (33 london districts: 
> http://data.london.gov.uk/2011-boundary-files)
> ||vertices||old QPS||new QPS|
> |avg 5.6k|113.8|105.4|
> Russia geonames polygon (> 1000 components, crosses dateline, hugs poles, you 
> name it)
> ||vertices||old QPS||new QPS|
> |11598|1.17|5.35|
> The grid hurts russia (keeping it around -> 4 QPS), and you can see it also 
> hurts all the synthetic ones. Those london boundaries hit a sweet spot where 
> it helps just a tad but, I think we should remove it and its startup cost 
> along with it.
> We can probably organize the tree better to be more efficient with many 
> components: for contains() we could just pack them all into one poly. But i'm 
> worried what this will do for relations (there would be fake edges between 
> components i think?), and it would be complicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to