[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511391#comment-14511391 ]
Michael McCandless commented on LUCENE-6450: -------------------------------------------- Thanks [~nknize], new patch looks great ... but can you add @lucene.experimental to all class-level javadocs so users know the index format is subject to change? I think these classes really do belong in core: they cover the "common case" for spatial search. But maybe we should start with sandbox for now since we may make changes that break the index format? E.g. I think we should find a way to make use of index-time prefix terms (auto prefix or numeric field), because with the patch now we will visit O(N) terms and O(N) docs in the common case (no docs have exactly the same geo point), but if we can use prefix terms, we visit O(log(N)) terms and the same O(N) docs. The default block postings format is a far more efficient decode than the block terms dict, so offloading the work from terms dict -> postings should be a big win (and the post-filtering work would be unchanged, but would have to use doc values not the term). We could do smart things in that case, e.g. carefully pick which prefix terms to make use of because they are 100% contained by the shape, and then OR that with another query that matches the "edge cells" that must do post-filtering. Maybe we try a different space filling curve, e.g. I think Hilbert curves would be good since they have better spatial locality? They do have higher index-time cost to encode, which is fine, and if we have to cutover to doc values for post-filtering anyway (if we use the prefix terms) then we wouldn't need to pay a Hilbert decode cost at search time. But this all should come later: I think this patch is a huge step forward already. > Add simple encoded GeoPointField type to core > --------------------------------------------- > > Key: LUCENE-6450 > URL: https://issues.apache.org/jira/browse/LUCENE-6450 > Project: Lucene - Core > Issue Type: New Feature > Affects Versions: Trunk, 5.x > Reporter: Nicholas Knize > Priority: Minor > Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, > LUCENE-6450.patch > > > At the moment all spatial capabilities, including basic point based indexing > and querying, require the lucene-spatial module. The spatial module, designed > to handle all things geo, requires dependency overhead (s4j, jts) to provide > spatial rigor for even the most simplistic spatial search use-cases (e.g., > lat/lon bounding box, point in poly, distance search). This feature trims the > overhead by adding a new GeoPointField type to core along with > GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This > field is intended as a straightforward lightweight type for the most basic > geo point use-cases without the overhead. > The field uses simple bit twiddling operations (currently morton hashing) to > encode lat/lon into a single long term. The queries leverage simple > multi-phase filtering that starts by leveraging NumericRangeQuery to reduce > candidate terms deferring the more expensive mathematics to the smaller > candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org