[ 
https://issues.apache.org/jira/browse/LUCENE-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369716#comment-16369716
 ] 

David Smiley commented on LUCENE-8126:
--------------------------------------

Thanks for investing the time into the illustrations [~ivera].  The diagram of 
the 3 prefix trees is very illustrative.  Usually when I think of people 
indexing "squares" I believe the square is aligned to lines of longitude and 
latitude... but this is not true for the so-called "squares" for your use-case? 
 Regardless of that, people index all kinds of shapes, e.g. circles, polygons 
and they will look differently at different latitudes.  I didn't know that it 
affects the cell count this much -- thanks for enlightening me.  I knew it 
_could_ in what I thought was some extreme cases but your diagram seems to show 
it's typical.  Hmm.  _I wonder if similar results could be achieved by 
internally using the web-mercator projection_?  Of course some scheme is needed 
to handle the polar caps which that projection doesn't even cover but whatever. 
 The web-mercator projection increases the overall size of the shape both 
latitudinally and longitudinally equally, and thus would probably yield roughly 
similar numbers of cells at all latitudes; wouldn't it?

RE index size -- you probably had difficulty benchmarking the differences 
because you used Lucene defaults.  Switch to a doc count based index writer 
flush (instead of memory based), and use SerialMergeScheduler to get 
predictable segments, albeit slower throughput that you wouldn't normally do in 
production.  This stuff can have a big impact on benchmark results, not just 
for index size but sometimes also benchmarking queries depending on how "lucky" 
one of the benchmark runs got if a big merge occurred to yield much fewer 
segments.

I'm having difficulty finding the benchmark; can you provide a link to the GH 
file?

At first I was unsure how S2 might improve point query performance but after 
some thought I figure that the cell count discussion for indexed shapes would 
apply as well for the cells a query shape might have to traverse.  Again; I 
wonder if a web-mercator projection would get similar improvements?  

Another nice thing about web-mercator based underlying coordinate system is 
that the index-time heatmap feature would produce a grid of numbers that are 
nice squares to be displayed in a web-mercator map client-side.  Today they 
tend to be horizontal rectangles that get flatter as you go to the poles.  It's 
not just about visual preference of squares; it's also about trying to ensure 
that any secondary processing of the raw heatmap data doesn't unintentionally 
skew/misrepresent data due to an assumption of a uniform grid when it's not 
actually uniform.  Sorry to get a little side-tracked but it's related.

> Spatial prefix tree based on S2 geometry
> ----------------------------------------
>
>                 Key: LUCENE-8126
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8126
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial-extras
>            Reporter: Ignacio Vera
>            Assignee: Ignacio Vera
>            Priority: Major
>         Attachments: SPT-cell.pdf, SPT-query.jpeg
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hi [~dsmiley],
> I have been working on a prefix tree based on goggle S2 geometry 
> (https://s2geometry.io/) to be used mainly with Geo3d shapes with very 
> promising results, in particular for complex shapes (e.g polygons). Using 
> this pixelization scheme reduces the size of the index, improves the 
> performance of the queries and reduces the loading time for non-point shapes. 
> If you are ok with this contribution and before providing any code I would 
> like to understand what is the correct/prefered approach:
> 1) Add new depency to the S2 library 
> (https://mvnrepository.com/artifact/io.sgr/s2-geometry-library-java). It has 
> Apache 2.0 license so it should be ok.
> 2) Create a utility class with all methods necessary to navigate the S2 tree 
> and create shapes from S2 cells (basically port what we need from the library 
> into Lucene).
> What do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to