[ 
https://issues.apache.org/jira/browse/LUCENE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-7069:
---------------------------------------
    Attachment: LUCENE-7069.patch

Here's another patch ... I think it's nearly ready.

I added "nearest neighbor" to the luceneutil London, UK geo benchmark, and 
getting top 10 nearest hits from the center of each of the 225 boxes that 
benchmark tests is extremely fast ... less than 0.5 msec for each call ... KD 
trees are very good for this ;)

This adds {{LatLonPoint.nearest}}, and it now takes an {{int topN}} (vs. single 
nearest point in previous patches).

I think it solves the adversary issues of the previous patches, at least the 
ones that are solvable.  Some are not!  Imagine the index only contains many 
points indexed from a circle, and then you ask for nearest from its center ... 
no matter your algorithm, that will amount to a linear scan.

There's one wrinkle though: the search works directly with {{BKDReader}}, 
because it's a best first search across all segments, picking which cell 
(across all segments) is the most "promising" to descend into next.  This means 
it requires that you use {{Lucene60PointsFormat}} (the default codec), and it 
gets the underlying {{BKDReader}} from that.  I think making this work only 
through the codec APIs is ... tricky.

There are further optimizations we could do like using {{haversinSortKey}}, or 
maybe fixing the {{approxBestDistance}} to be a true best distance (I need 
heavy geo math help for that!), and using Lucene's priority queue not the 
JDK's, but these all can come later.


> Add LatLonPoint.nearest to find closest indexed point to a given query point
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-7069
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7069
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master, 6.1
>
>         Attachments: LUCENE-7069.patch, LUCENE-7069.patch, LUCENE-7069.patch
>
>
> KD trees (used by Lucene's new dimensional points) excel at finding "nearest 
> neighbors" to a given query point ... I think we should add this to Lucene's 
> sandbox as:
> {noformat}
>   public static Document nearest(IndexReader r, String field, double lat, 
> double lon) throws IOException
> {noformat}
> I only implemented the 1 nearest neighbor for starters ... I think we can 
> easily generalize this in the future to K nearest.
> It could also be generalized to more than 2 dimensions, but for now I'm 
> making the class package private in sandbox for just the geo2d (lat/lon) use 
> case.
> I don't think this should go into 6.0.0, but should go into 6.1: it's a new 
> feature, and we need to wrap up and ship 6.0.0 already ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to