[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511391#comment-14511391
 ] 

Michael McCandless commented on LUCENE-6450:
--------------------------------------------

Thanks [~nknize], new patch looks great ... but can you add 
@lucene.experimental to all class-level javadocs so users know the index format 
is subject to change?

I think these classes really do belong in core: they cover the "common case" 
for spatial search.  But maybe we should start with sandbox for now since we 
may make changes that break the index format?

E.g. I think we should find a way to make use of index-time prefix terms (auto 
prefix or numeric field), because with the patch now we will visit O(N) terms 
and O(N) docs in the common case (no docs have exactly the same geo point), but 
if we can use prefix terms, we visit O(log(N)) terms and the same O(N) docs.  
The default block postings format is a far more efficient decode than the block 
terms dict, so offloading the work from terms dict -> postings should be a big 
win (and the post-filtering work would be unchanged, but would have to use doc 
values not the term).

We could do smart things in that case, e.g. carefully pick which prefix terms 
to make use of because they are 100% contained by the shape, and then OR that 
with another query that matches the "edge cells" that must do post-filtering.

Maybe we try a different space filling curve, e.g. I think Hilbert curves would 
be good since they have better spatial locality?  They do have higher 
index-time cost to encode, which is fine, and if we have to cutover to doc 
values for post-filtering anyway (if we use the prefix terms) then we wouldn't 
need to pay a Hilbert decode cost at search time.

But this all should come later: I think this patch is a huge step forward 
already.

> Add simple encoded GeoPointField type to core
> ---------------------------------------------
>
>                 Key: LUCENE-6450
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6450
>             Project: Lucene - Core
>          Issue Type: New Feature
>    Affects Versions: Trunk, 5.x
>            Reporter: Nicholas Knize
>            Priority: Minor
>         Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
> LUCENE-6450.patch
>
>
> At the moment all spatial capabilities, including basic point based indexing 
> and querying, require the lucene-spatial module. The spatial module, designed 
> to handle all things geo, requires dependency overhead (s4j, jts) to provide 
> spatial rigor for even the most simplistic spatial search use-cases (e.g., 
> lat/lon bounding box, point in poly, distance search). This feature trims the 
> overhead by adding a new GeoPointField type to core along with 
> GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
> field is intended as a straightforward lightweight type for the most basic 
> geo point use-cases without the overhead. 
> The field uses simple bit twiddling operations (currently morton hashing) to 
> encode lat/lon into a single long term.  The queries leverage simple 
> multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
> candidate terms deferring the more expensive mathematics to the smaller 
> candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to