[
https://issues.apache.org/jira/browse/LUCENE-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated LUCENE-5579:
---------------------------------
Attachment: LUCENE-5579_CompositeSpatialStrategy.patch
I moved the leaf cell differentiation to its own issue where it belongs --
LUCENE-6362.
The attached patch addresses the two-phase index + verify requirement in one
new "CompositeStrategy" so that we can get speed + accuracy and with
convenience made possible by Lucene 5.1's low-level TwoPhaseIterator. I'm not
married to the name; I'm not sure what to call it. This patch contains two
Query implementations -- one specifically for optimizing the Intersects
predicate which uniquely retains a DocIdSet of "exact" (aka pre-confirmed) hits
separate from the approximate set overall. The other one is more generic but
must verify every hit by looking up the geometry and applying the predicate.
There are a lot of TODO comments and I haven't tested it thoroughly to my
liking yet. So it's not done, but it does seem to work.
I did some benchmarking but had to stop as I can't spend more time on this
right now. I'm puzzled why I don't see any performance improvements using the
optimized Intersects predicate. I debugged it and observed that the benchmark
setup I have doesn't seem to be yielding any exact/confirmed hits, oddly
enough. Yet the testing I have does show this happens. So maybe it's a
benchmark config bug, or who knows.
The patch includes a refactoring to make
org.apache.lucene.search.ConstantScoreWeight public instead of package level.
It's very general & useful. I think a similar thing could be done with a
constant scoring Scorer.
> Spatial, enhance RPT to differentiate confirmed from non-confirmed hits, then
> validate with SDV
> -----------------------------------------------------------------------------------------------
>
> Key: LUCENE-5579
> URL: https://issues.apache.org/jira/browse/LUCENE-5579
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/spatial
> Reporter: David Smiley
> Attachments: LUCENE-5579_CompositeSpatialStrategy.patch,
> LUCENE-5579_SPT_leaf_covered.patch
>
>
> If a cell is within the query shape (doesn't straddle the edge), then you can
> be sure that all documents it matches are a confirmed hit. But if some
> documents are only on the edge cells, then those documents could be validated
> against SerializedDVStrategy for precise spatial search. This should be
> *much* faster than using RPT and SerializedDVStrategy independently on the
> same search, particularly when a lot of documents match.
> Perhaps this'll be a new RPT subclass, or maybe an optional configuration of
> RPT. This issue is just for the Intersects predicate, which will apply to
> Disjoint. Until resolved in other issues, the other predicates can be
> handled in a naive/slow way by creating a filter that combines RPT's filter
> and SerializedDVStrategy's filter using BitsFilteredDocIdSet.
> One thing I'm not sure of is how to expose to Lucene-spatial users the
> underlying functionality such that they can put other query/filters
> in-between RPT and the SerializedDVStrategy. Maybe that'll be done by simply
> ensuring the predicate filters have this capability and are public.
> It would be ideal to implement this capability _after_ the PrefixTree term
> encoding is modified to differentiate edge leaf-cells from non-edge leaf
> cells. This distinction will allow the code here to make more confirmed
> matches.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]