liutengfei created ABDERA-303:
---------------------------------
Summary: CIReducer in kmeans doesn't work well
Key: ABDERA-303
URL: https://issues.apache.org/jira/browse/ABDERA-303
Project: Abdera
Issue Type: Bug
Environment: hadoop:
hadoop-2.0.0-alpha: pseudo cluster and single node cluster
hadoop-1.0.3: pseudo cluster
hadoop-0.20.2:pseudo cluster
mahout:
mahout-0.7
os:
ubuntu 11.04
jdk:
jdk1.6.0_27
Reporter: liutengfei
the function reduce in mahout-0.7-kmeans-CIReducer.java doesn't work well as it
looks like.
protected void reduce(IntWritable key, Iterable<ClusterWritable> values,
Context context) throws IOException,
InterruptedException {
Iterator<ClusterWritable> iter = values.iterator();
ClusterWritable first = null;
while (iter.hasNext()) {
ClusterWritable cw = iter.next();
if (first == null) {
first = cw;
} else {
first.getValue().observe(cw.getValue());
}
}
List<Cluster> models = new ArrayList<Cluster>();
models.add(first.getValue());
classifier = new ClusterClassifier(models, policy);
classifier.close();
context.write(key, first);
}
Apparently, the variable "first" will collect all output data of maps. Actually
but, the value of "first" will change after the code "ClusterWritable cw =
iter.next();", same with this new variable "cw"! I don't why but running result
shows that the code runs looks like this:"ClusterWritable cw = first =
iter.next();".
"cw" is a reference a to "iter"?
is "iter.next" just change the value of "iter" itself to the next?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira