[
https://issues.apache.org/jira/browse/LUCENE-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-8091:
---------------------------------
Attachment: LUCENE-8091.patch
Here is a prototype that demonstrates the idea in the 1D case. For instance
when running a nearest-neighbor search given a dataset of 1M points uniformly
distributed between 0 and 1M, a regular sorted search needs to visit the 1M
documents (as expected), while this new special query only requires ~11k calls
to DocIdSetIterator.nextDoc / TopDocsCollector.collect, ~32k calls to
IntersectVisitor.visit, ~3k calls to IntersectVisitor.compare and runs about 7x
faster.
This patch needs a lot of cleaning/testing before being ready to commit.
> Better nearest-neighbor queries
> -------------------------------
>
> Key: LUCENE-8091
> URL: https://issues.apache.org/jira/browse/LUCENE-8091
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-8091.patch
>
>
> LatLonPoint.nearest is very efficient at identifying the top-k documents
> sorted by distance from a given point, by working directly on the BKD tree.
> This doesn't support filtering though, so if you need to filter by another
> property, you need to switch to querying on the filter and sorting by a
> LatLonPointSortField. Unfortunately this requires visiting all documents that
> match the filter.
> We could leverage the new {{setMinCompetitiveScore}} API introduced in
> LUCENE-4100 in order to allow for retrieval of nearest neighbors with
> arbitrary filters, by recomputing a bounding-box when a new minimum
> competitive score is set.
> In the future we could also leverage this to speed up queries that are
> boosted by distance. For instance if the final score is a weighted sum of the
> score on a text field and a distance-based score, and the minimum competitive
> score gets higher than the maximum score that may be produced on the text
> field at some point, then we could dynamically prune hits based on the
> distance.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]