[ 
https://issues.apache.org/jira/browse/LUCENE-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789559#comment-13789559
 ] 

Uwe Schindler edited comment on LUCENE-5258 at 10/8/13 7:08 PM:
----------------------------------------------------------------

I Robert, hi Ted,
if I have some time later, I will post another "sloppy" distance function, 
which is still almost correct (also near the poles) and works very good for 
distances up to 500 km: "Polar coordinate flat-Earth formula" (see 
[http://en.wikipedia.org/wiki/Geographical_distance#Polar_coordinate_flat-Earth_formula])

This one only needs one cosinus and a square root, which can be the table 
lookup and the native sqrt() processor instruction.

This formula is perfect for scoring. If you score and just multiply someting 
like 1/distance or 1/ln(distance) to your score, the precision is not really 
important. We use the above formula for that. By that it is possible to find 
all places around some coordinate, but still have better matching (similarity 
wise) results score higher, although they are far more away and vice versa (my 
favourite example: if user searches for "vegetarian pizza", a pizzaria with the 
name "veggi pizza place" should score higher than "meat pizza place", although 
more far away). As our scoring factors are floats and the norms are just 8 bit 
floats, the distance can be very simplified. In our case, we just needed a good 
distance also near the poles, so the polar coordinate flat earth formula is 
perfect (and still very correct, also near poles).

In addition, if you multiply the score like score/ln(distance), you can remove 
the sqrt from the formula, too, because ln() makes just a factor out of it! 
This is perfect for similarity matches combined with distance.


was (Author: thetaphi):
I Robert, hi Ted,
if I have some time later, I will post another "sloppy" distance function, 
which is still almost correct (alos near the poles) and works very good for 
distances up to 500 km: "Polar coordinate flat-Earth formula" (see 
[http://en.wikipedia.org/wiki/Geographical_distance#Polar_coordinate_flat-Earth_formula])

This one only needs one cosinus and a square root, which can be the table 
lookup and the native sqrt() processor instruction.

This formula is perfect for scoring. If you score and just multiply someting 
like 1/distance or 1/ln(distance) to your score, the precision is not really 
important. We use the above formula for that. By that it is possible to find 
all places around some coordinate, but still have better matching (similarity 
wise) results score higher, although they are far more away and vice versa (my 
favourite example: if user searches for "vegetarian pizza", a pizzaria with the 
name "veggi pizza place" should score higher than "meat pizza place", although 
more far away). As our scoring factors are floats and the norms are just 8 bit 
floats, the distance can be very simplified. In our case, we just needed a good 
distance also near the poles, so the polar coordinate flat earth formula is 
perfect (and still very correct, also near poles).

In addition, if you multiply the score like score/ln(distance), you can remove 
the sqrt from the formula, too, because ln() makes just a factor out of it! 
This is perfect for similarity matches combined with distance.

> add distance function to expressions/
> -------------------------------------
>
>                 Key: LUCENE-5258
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5258
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/other
>            Reporter: Robert Muir
>             Fix For: 5.0, 4.6
>
>         Attachments: LUCENE-5258.patch
>
>
> Adding this static function makes it really easy to incorporate distance with 
> the score or other signals in arbitrary ways, e.g. score / (1 + 
> log(distance)) or whatever.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to