javaloveme created MAPREDUCE-6827:
-------------------------------------
Summary: Failed to traverse Iterable values the second time in
reduce() method
Key: MAPREDUCE-6827
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6827
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: task
Affects Versions: 3.0.0-alpha1
Environment: hadoop2.7.3
Reporter: javaloveme
Failed to traverse Iterable values the second time in reduce() method
The following code is a reduce() method (of WordCount):
public static class WcReducer extends Reducer<Text, IntWritable, Text,
IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {
// print some logs
List<String> vals = new LinkedList<>();
for(IntWritable i : values) {
vals.add(i.toString());
}
System.out.println(String.format(">>>> reduce(%s,
[%s])",
key, String.join(", ", vals)));
// sum of values
int sum = 0;
for(IntWritable i : values) {
sum += i.get();
}
System.out.println(String.format(">>>> reduced(%s, %s)",
key, sum));
context.write(key, new IntWritable(sum));
}
}
After running it, we got the result that all sums were zero!
After debugging, it was found that the second foreach-loop was not executed,
and the root cause was the returned value of Iterable.iterator(), it returned
the same instance in the two calls by foreach-loop. In general,
Iterable.iterator() should return a new instance in each call, such as
ArrayList.iterator().
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]