[ https://issues.apache.org/jira/browse/LUCENE-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538079#comment-14538079 ]
Michael McCandless commented on LUCENE-6477: -------------------------------------------- I tested performance with the same "bounding boxes around London, UK" on 60.8M points test from LUCENE-6450: {noformat} Index size disk: 1.2 GB Index size heap: 0.75 MB Index time: 636.0 seconds (incl. forceMerge) Mean query time: .0068 sec {noformat} Indexing time is a bit slower (makes heavy use of OfflineSorter, but should be O(N*log(N) overall). It's very fast at search time: ~5.7X faster than GeoHashPrefixTree. It also uses very little heap (0.75 MB) at search time for the inner nodes of the tree. It currently allocates FixedBitSet(maxDoc) per query & segment at search time (like spatial does with RecursivePrefixTreeStrategy, I think?). Really it should use BitDocIdSet.Builder, to start sparse and upgrade to FixedBitSet only if result set will be "big-ish", but when I do that the query is 2.4X slower, which is frustrating. I think we could try using SentinelIntSet for the sparse case, like spatial does in some cases. > Add BKD tree for spatial shape query intersecting indexed points > ---------------------------------------------------------------- > > Key: LUCENE-6477 > URL: https://issues.apache.org/jira/browse/LUCENE-6477 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: Trunk, 5.2 > > Attachments: LUCENE-6477.patch > > > I'd like to explore using dedicated spatial trees for faster shape > intersection filters than postings-based implementations. > I implemented the tree data structure from > https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf > The idea is simple: it builds a full binary tree, partitioning 2D > space, alternately on lat and then lon, into smaller and smaller > rectangles until a leaf has <= N (default 1024) points. > It cannot index shapes (just points), and can then do fast shape > intersection queries. Multi-valued fields are supported. > I only implemented the "point is contained in this bounding box" query > for now, but I think polygon shape querying should be easy to > implement using the same approach from LUCENE-6450. > For indexing, you add BKDPointField (takes lat, lon) to your doc, and > must set up your Codec use BKDTreeDocValuesFormat for that field. > This DV format wraps Lucene50DVFormat, but then builds the disk-based > BKD tree structure on the side. BKDPointInBBoxQuery then requires this > DVFormat, and casts it to gain access to the tree. > I quantize each incoming double lat/lon to 32 bits precision (so 64 > bits per point) = ~9 milli-meter lon precision at the equator, I > think. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org