[
https://issues.apache.org/jira/browse/MAHOUT-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013829#comment-13013829
]
Sebastian Schelter commented on MAHOUT-643:
-------------------------------------------
I see some small todos left:
* naming should be consistent, either CityBlockDistanceSimilarity and
DistributedCityBlockDistanceVectorSimilarity or CityBlockSimilarity and
DistributedCityBlockVectorSimilarity
* a new entry should be added to
org.apache.mahout.math.hadoop.similarity.SimilarityType
* the "distributed" implementation is not correct IMHO, the output from
weight() gives the number of users preferring a single item, the number of
cooccurrences gives the intersection size
It should look like this:
... extends AbstractDistributedVectorSimilarity ...
@Override
protected double doComputeResult(int rowA, int rowB, Iterable<Cooccurrence>
cooccurrences, double weightOfVectorA,
double weightOfVectorB, int numberOfColumns) {
int cooccurrenceCount = countElements(cooccurrences);
if (cooccurrenceCount == 0) {
return Double.NaN;
}
int distance = weightOfVectorA + weightOfVectorB - 2 * cooccurrenceCount;
return 1.0 / (1.0 + distance);
}
@Override
public double weight(Vector v) {
return (double) countElements(v.iterateNonZero());
}
> Adding CityBlockSimilarity and DistributedCityBlockDistanceVectorSimilarity
> ---------------------------------------------------------------------------
>
> Key: MAHOUT-643
> URL: https://issues.apache.org/jira/browse/MAHOUT-643
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering, Math
> Affects Versions: 0.5
> Reporter: Daniel McEnnis
> Assignee: Sean Owen
> Priority: Minor
> Labels: distance, patch, similarity
> Fix For: 0.5
>
> Attachments: MAHOUT-643-2.patch, MAHOUT-643.patch, patch4.txt
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> adding a new distance metric to the 0.5 branch
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira