[ 
https://issues.apache.org/jira/browse/LUCENE-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543708#comment-16543708
 ] 

Nicholas Knize edited comment on LUCENE-8396 at 7/13/18 9:02 PM:
-----------------------------------------------------------------

Thanks [~jpountz], [~mikemccand], and [~dsmiley] some great feedback here:

Great suggestions [~jpountz]. I'll make these changes, clean up the code, and 
push to sandbox for iterating.

[~dsmiley]
{quote}There are also some off the shelf computational geometry libraries that 
can simplify polygons (JTS can), and some users may want to do that when 
indexing shapes.
{quote}
I took a look at several of the OTS simplification a while ago. I'm sure there 
are many more out there than what I canvassed so this isn't intended to be a 
catch all comment. And we could certainly look at other tessellation libraries 
in the future for iterative performance improvements. But I did look at the JTS 
simplification tools. The problem with those varied: from tolerance value 
errors creating invalid polygons to general performance overhead. Each were 
great as a stand alone utility, but none were reliable nor performant enough 
for the scale desired.
{quote}they ought to use SerializedDVStrategy for accuracy combined with 
RecursivePrefixTreeStrategy for a grid index
{quote}
Indeed this is an optimization allowing one to index "larger" quad cells (to 
reduce the number of terms in the inverted index) and subsequently use the WKB 
{{BinaryDocValues}} for accurate relations. But not only is the use of 
BinaryDocValues with WKB limiting (with respect to shapes with large number of 
vertices and the marshalling/unmarshalling overhead) but all of the calls to 
{{.relate}} (with either JTS, S2, or other third party implementations) incur 
their own performance penalties.. not just for relating each quad cell to the 
query shape but also with having to unmarshall "leaf shapes" and compute the 
DE9IM to relate those to the query shape for accuracy.
{quote}combined with...and...wrapping both
{quote}
This was the other motivator... Like {{LatLonPoint}} its nice to have a simple 
API (hopefully intended for core) that can solve the general Shape indexing and 
search use case without having to decide which PrefixTree, Grid, and Shape 
Relation libraries to use. Then for the more expert use cases users can always 
move over to {{spatial-extras}} for all of the additional choices.



was (Author: nknize):
Thanks [~jpountz], [~mikemccand], and [~dsmiley] some great feedback here:

Great suggestions [~jpountz]. I'll make these changes, clean up the code, and 
push to sandbox for iterating.

[~dsmiley]
{quote}There are also some off the shelf computational geometry libraries that 
can simplify polygons (JTS can), and some users may want to do that when 
indexing shapes.
{quote}
I took a look at several of the OTS simplification a while ago. I'm sure there 
are many more out there than what I canvassed so this isn't intended to be a 
catch all comment. And we could certainly look at other tessellation libraries 
in the future for iterative performance improvements. But I did look at the JTS 
simplification tools. The problem with those varied: from tolerance value 
errors creating invalid polygons to general performance overhead. Each were 
great as a stand alone utility, but none were reliable nor performant enough 
for the scale desired.
{quote}they ought to use SerializedDVStrategy for accuracy combined with 
RecursivePrefixTreeStrategy for a grid index
{quote}
Indeed this is an optimization allowing one to index "larger" quad cells (to 
avoid all of the terms in the inverted index) and subsequently use the WKB 
{{BinaryDocValues}} for accurate relations. But not only is the use of 
BinaryDocValues with WKB limiting (with respect to shapes with large number of 
vertices and the marshalling/unmarshalling overhead) but all of the calls to 
{{.relate}} (with either JTS, S2, or other third party implementations) incur 
their own performance penalties.. not just for relating each quad cell to the 
query shape but also with having to unmarshall "leaf shapes" and compute the 
DE9IM to relate those to the query shape for accuracy.
{quote}combined with...and...wrapping both
{quote}
This was the other motivator... Like {{LatLonPoint}} its nice to have a simple 
API (hopefully intended for core) that can solve the general Shape indexing and 
search use case without having to decide which PrefixTree, Grid, and Shape 
Relation libraries to use. Then for the more expert use cases users can always 
move over to {{spatial-extras}} for all of the additional choices.


> Add Points Based Shape Indexing
> -------------------------------
>
>                 Key: LUCENE-8396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8396
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Nicholas Knize
>            Priority: Major
>         Attachments: LUCENE-8396.patch, polyWHole.png, tessellatedPoly.png
>
>
> I've been tinkering with this for a while and would like to solicit some 
> feedback. I'd like to introduce a new shape field based on the BKD/Points 
> codec to bring much of the Points based performance improvements to the shape 
> indexing and search usecase. Much like the existing shape indexing in 
> {{spatial-extras}} the shape will be decomposed into smaller parts, but 
> instead of decomposing into quad cells (which have the drawback of precision 
> accuracy and sheer volume of terms) I'd like to explore decomposing the 
> shapes into a triangular mesh; similar to gaming and computer graphics. Not 
> only does this approach reduce the number of terms, but it has the added 
> benefit of better accuracy (precision is based on the index encoding 
> technique instead of the spatial resolution of the quad cell). 
> For better clarity, consider the following illustrations (of a polygon in a 1 
> degree x 1 degree spatial area).  The first is using the quad tree technique 
> applied in the existing inverted index. The second is using a triangular mesh 
> decomposition as used by popular OpenGL and javascript rendering systems 
> (such as those used by mapbox).
> !polyWHole.png!
> Decomposing this shape using a quad tree results in 1,105,889 quad terms at 3 
> meter spatial resolution.
> !tessellatedPoly.png!
>  
> Decomposing using a triangular mesh results in 8 triangles at the same 
> resolution as {{encodeLat/Lon}}.
> The decomposed triangles can then be encoded as a 6 dimensional POINT and 
> queries are implemented using the computed relations against these triangles 
> (similar to how its done with the inverted index today).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to