I'm trying to write a Reducer which will eliminate duplicates from the list of values before writing them out. I have the following code for my Reducer:
/*****************/ public class ClickStreamIndexerReducer extends Reducer<Text, Text, Text, Text> { @Override public void reduce(Text dirName, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException { Text value = new Text(); Text lastValue = new Text(); Iterator<Text> valuesIterator = values.iterator(); while(valuesIterator.hasNext()) { value = valuesIterator.next(); while(value.equals(lastValue)){ context.write(key, value); lastValue = value; } } } } /*****************/ Right before the first time "value = valuesIterator.next()" is called, both value and lastValue are empty as expected. Then value is set to the first value and lastValue is still empty. After I write out value I set lastValue to value. The first time through the outer while loop everything goes as expected. However the next time through, when "value = valuesIterator.next()" is called, both value and lastValue are set to the exact same object. Every time through the loop after that, when value is set, lastValue gets set to the same thing.