javaloveme created MAPREDUCE-6827:
-------------------------------------

             Summary: Failed to traverse Iterable values the second time in 
reduce() method
                 Key: MAPREDUCE-6827
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6827
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: task
    Affects Versions: 3.0.0-alpha1
         Environment: hadoop2.7.3
            Reporter: javaloveme


Failed to traverse Iterable values the second time in reduce() method

The following code is a reduce() method (of WordCount):

        public static class WcReducer extends Reducer<Text, IntWritable, Text, 
IntWritable> {

                @Override
                protected void reduce(Text key, Iterable<IntWritable> values, 
Context context)
                                throws IOException, InterruptedException {

                        // print some logs
                        List<String> vals = new LinkedList<>();
                        for(IntWritable i : values) {
                                vals.add(i.toString());
                        }
                        System.out.println(String.format(">>>> reduce(%s, 
[%s])",
                                        key, String.join(", ", vals)));

                        // sum of values
                        int sum = 0;
                        for(IntWritable i : values) {
                                sum += i.get();
                        }
                        System.out.println(String.format(">>>> reduced(%s, %s)",
                                        key, sum));
                        
                        context.write(key, new IntWritable(sum));
                }                       
        }

After running it, we got the result that all sums were zero!

After debugging, it was found that the second foreach-loop was not executed, 
and the root cause was the returned value of Iterable.iterator(), it returned 
the same instance in the two calls by foreach-loop. In general, 
Iterable.iterator() should return a new instance in each call, such as 
ArrayList.iterator().





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to