i found out what my problem was. apparently, when you iterate over Iterable<Type> values, that instance of Type is being used over and over. for example, in my reducer,
public void reduce(Key key, Iterator<Value> values, Context context) throws IOException, InterruptedException { Iterator<Value> it = values.iterator(); Value a = it.next(); Value b = it.next(); } the variables, a and b of type Value, will be the same object instance! i suppose this behavior of the iterator is to optimize iterating so as to avoid the new operator. On Thu, Apr 5, 2012 at 4:55 PM, Jane Wayne <jane.wayne2...@gmail.com> wrote: > i am currently testing my map reduce job on Windows + Cygwin + Hadoop > v0.20.205. for some strange reason, the list of values (i.e. > Iterable<T> values) going into the reducer looks all wrong. i have > tracked the map reduce process with logging statements (i.e. logged > the input to the map, logged the output from the map, logged the > partitioner, logged the input to the reducer). at all stages, > everything looks correct except at the reducer. > > is there anyway (using Windows + Cygwin) to view the local map > outputs before they are shuffled/sorted to the reducer? i need to know > why the values are incorrect.