[ 
https://issues.apache.org/jira/browse/LUCENE-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-8091:
---------------------------------
    Attachment: LUCENE-8091.patch

Here is a prototype that demonstrates the idea in the 1D case. For instance 
when running a nearest-neighbor search given a dataset of 1M points uniformly 
distributed between 0 and 1M, a regular sorted search needs to visit the 1M 
documents (as expected), while this new special query only requires ~11k calls 
to DocIdSetIterator.nextDoc / TopDocsCollector.collect, ~32k calls to 
IntersectVisitor.visit, ~3k calls to IntersectVisitor.compare and runs about 7x 
faster.

This patch needs a lot of cleaning/testing before being ready to commit.

> Better nearest-neighbor queries
> -------------------------------
>
>                 Key: LUCENE-8091
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8091
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8091.patch
>
>
> LatLonPoint.nearest is very efficient at identifying the top-k documents 
> sorted by distance from a given point, by working directly on the BKD tree. 
> This doesn't support filtering though, so if you need to filter by another 
> property, you need to switch to querying on the filter and sorting by a 
> LatLonPointSortField. Unfortunately this requires visiting all documents that 
> match the filter.
> We could leverage the new {{setMinCompetitiveScore}} API introduced in 
> LUCENE-4100 in order to allow for retrieval of nearest neighbors with 
> arbitrary filters, by recomputing a bounding-box when a new minimum 
> competitive score is set.
> In the future we could also leverage this to speed up queries that are 
> boosted by distance. For instance if the final score is a weighted sum of the 
> score on a text field and a distance-based score, and the minimum competitive 
> score gets higher than the maximum score that may be produced on the text 
> field at some point, then we could dynamically prune hits based on the 
> distance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to