[
https://issues.apache.org/jira/browse/LUCENE-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789559#comment-13789559
]
Uwe Schindler edited comment on LUCENE-5258 at 10/8/13 7:08 PM:
----------------------------------------------------------------
I Robert, hi Ted,
if I have some time later, I will post another "sloppy" distance function,
which is still almost correct (also near the poles) and works very good for
distances up to 500 km: "Polar coordinate flat-Earth formula" (see
[http://en.wikipedia.org/wiki/Geographical_distance#Polar_coordinate_flat-Earth_formula])
This one only needs one cosinus and a square root, which can be the table
lookup and the native sqrt() processor instruction.
This formula is perfect for scoring. If you score and just multiply someting
like 1/distance or 1/ln(distance) to your score, the precision is not really
important. We use the above formula for that. By that it is possible to find
all places around some coordinate, but still have better matching (similarity
wise) results score higher, although they are far more away and vice versa (my
favourite example: if user searches for "vegetarian pizza", a pizzaria with the
name "veggi pizza place" should score higher than "meat pizza place", although
more far away). As our scoring factors are floats and the norms are just 8 bit
floats, the distance can be very simplified. In our case, we just needed a good
distance also near the poles, so the polar coordinate flat earth formula is
perfect (and still very correct, also near poles).
In addition, if you multiply the score like score/ln(distance), you can remove
the sqrt from the formula, too, because ln() makes just a factor out of it!
This is perfect for similarity matches combined with distance.
was (Author: thetaphi):
I Robert, hi Ted,
if I have some time later, I will post another "sloppy" distance function,
which is still almost correct (alos near the poles) and works very good for
distances up to 500 km: "Polar coordinate flat-Earth formula" (see
[http://en.wikipedia.org/wiki/Geographical_distance#Polar_coordinate_flat-Earth_formula])
This one only needs one cosinus and a square root, which can be the table
lookup and the native sqrt() processor instruction.
This formula is perfect for scoring. If you score and just multiply someting
like 1/distance or 1/ln(distance) to your score, the precision is not really
important. We use the above formula for that. By that it is possible to find
all places around some coordinate, but still have better matching (similarity
wise) results score higher, although they are far more away and vice versa (my
favourite example: if user searches for "vegetarian pizza", a pizzaria with the
name "veggi pizza place" should score higher than "meat pizza place", although
more far away). As our scoring factors are floats and the norms are just 8 bit
floats, the distance can be very simplified. In our case, we just needed a good
distance also near the poles, so the polar coordinate flat earth formula is
perfect (and still very correct, also near poles).
In addition, if you multiply the score like score/ln(distance), you can remove
the sqrt from the formula, too, because ln() makes just a factor out of it!
This is perfect for similarity matches combined with distance.
> add distance function to expressions/
> -------------------------------------
>
> Key: LUCENE-5258
> URL: https://issues.apache.org/jira/browse/LUCENE-5258
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/other
> Reporter: Robert Muir
> Fix For: 5.0, 4.6
>
> Attachments: LUCENE-5258.patch
>
>
> Adding this static function makes it really easy to incorporate distance with
> the score or other signals in arbitrary ways, e.g. score / (1 +
> log(distance)) or whatever.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]