MRUnit Should Sort Reduce Input
-------------------------------
Key: MAPREDUCE-1216
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1216
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 0.20.1
Environment: Cloudera Distribution for Hadoop 0.20.1 + 133
Reporter: Ed Kohlwey
MRUnit should sort the input for a reduce task, the same way hadoop does.
This is useful if you have a reduce task that, for instance, removes duplicate
key value pairs.
example:
{code:java}
class BadReducer extends Reducer{
public void reduce(...){
Text last = new Text();
for(Text text: values){
if(!text.equals(last)){
context.write(key, text);
last.set(text);
}
}
}
}
{code}
{code:java}
ReduceDriver driver = new ReduceDriver()
driver.setInputKey("foo");
driver.addInputValue("bar");
driver.addInputValue("bar");
driver.addInputValue("foo");
{code}
produces different results than
{code:java}
ReduceDriver driver = new ReduceDriver()
driver.setInputKey("foo");
driver.addInputValue("bar");
driver.addInputValue("foo");
driver.addInputValue("bar");
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.