[ 
https://issues.apache.org/jira/browse/MAHOUT-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013203#comment-13013203
 ] 

Sean Owen commented on MAHOUT-643:
----------------------------------

For avoidance of doubt, I think the similarity computation would be better 
written as:

int distance = pref1 + pref2 - 2 * intersection;
if (distance == 0) {
  return 1.0;
} else {
  return 1.0 / distance - 1.0;
}

It's equivalent and more direct. But, the range of output values is odd. 
Perfect overlap means 1.0 similariy, good. Anything less is 0.0, going towards 
-1.0 as similarity decreases. This discontinuity seems unnecessary, even if the 
output is at least directionally correct.

I'd either like to understand why this is desirable, or else use something more 
"conventional" like:

int distance = pref1 + pref2 - 2 * intersection;
return 1.0 / (1.0 + distance);

> Adding CityBlockSimilarity and DistributedCityBlockDistanceVectorSimilarity
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-643
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-643
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering, Math
>    Affects Versions: 0.5
>            Reporter: Daniel McEnnis
>            Priority: Minor
>              Labels: distance, patch, similarity
>             Fix For: 0.5
>
>         Attachments: patch4.txt
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> adding a new distance metric to the 0.5 branch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to