[ https://issues.apache.org/jira/browse/MAHOUT-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hao Zhong updated MAHOUT-1959: ------------------------------ Status: Patch Available (was: Open) diff --git a/mr/src/main/java/org/apache/mahout/clustering/streaming/cluster/BallKMeans.java b/mr/src/main/java/org/apache/mahout/clustering/streaming/cluster/BallKMeans.java index 25806fe..350f189 100644 --- a/mr/src/main/java/org/apache/mahout/clustering/streaming/cluster/BallKMeans.java +++ b/mr/src/main/java/org/apache/mahout/clustering/streaming/cluster/BallKMeans.java @@ -436,7 +436,8 @@ } for (WeightedVector datapoint : datapoints) { Centroid closestCentroid = (Centroid) centroids.searchFirst(datapoint, false).getValue(); - closestCentroid.setWeight(closestCentroid.getWeight() + datapoint.getWeight()); + double closestCentroidWeight = centroids.searchFirst(datapoint, false).getWeight(); + closestCentroid.setWeight(closestCentroidWeight + datapoint.getWeight()); } } } > BallKMeans.iterativeAssignment can set wrong weights. > ----------------------------------------------------- > > Key: MAHOUT-1959 > URL: https://issues.apache.org/jira/browse/MAHOUT-1959 > Project: Mahout > Issue Type: Bug > Reporter: Hao Zhong > > I notice that the BallKMeans.iterativeAssignment method uses the following > code to calculate weights: > {code:title=BallKMeans.java|borderStyle=solid} > for (WeightedVector datapoint : datapoints) { > Centroid closestCentroid = (Centroid) > centroids.searchFirst(datapoint, false).getValue(); > closestCentroid.setWeight(closestCentroid.getWeight() + > datapoint.getWeight()); > } > {code} > In MAHOUT-1237, the buggy code is the same way to calculate the weight: > {code:title=ClusteringUtils.java|borderStyle=solid} > for (Vector vector : datapoints) { > Centroid closest = (Centroid) centroids.searchFirst(vector, > false).getValue(); > totalCost += closest.getWeight(); > } > {code} > The fixed code is as follow: > {code:title=ClusteringUtils.java|borderStyle=solid} > for (Vector vector : datapoints) { > totalCost += centroids.searchFirst(vector, false).getWeight(); > } > {code} > I am not quite sure whether BallKMeans.iterativeAssignment sets the right > weights. Please check it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)