Hi James, The ReduceDriver is configured to receive a list of inputs because lists have ordering guarantees whereas other Iterables/Collections do not; for determinism's sake, it is best to guarantee that you're calling reduce() with an ordered set of values when testing.
It would be stellar if you could improve the ReduceDriver to reuse a writable instance between calls. You'll need to infer the appropriate container class type from the first instance you see in the reducer's output, and use the serialization API to make a copy. If you look at o.a.h.mrunit.mock.MockOutputCollector, this will show a pattern you can work on. Cheers, - Aaron On Thu, Dec 9, 2010 at 2:21 AM, James Hammerton <[email protected]> wrote: > Hi, > > This relates to a bug we had a while back. > > When running a reducer, if you want to buffer the values, you normally need > to take a copy of each value as you iterate through them. This is because > the iterator always returns the same object but the contents of the object > get filled with each value as the iterator steps through. > > However this behaviour is not reproduced by the reducer drivers in MR unit. > Even if you give the reduce driver a List (why do we have to give a List > when reducer specifies merely an Iterable?) designed to behave this way, MR > unit copies the values into a normal List before presenting them to the > reducer. At least this is the case with the 0.20.1 install we have. > > Anyway, in order to test our bug fix we extended the ReduceDriver class to > actually copy the values into an iterable that does reproduce the behaviour > so that we can test for bugs caused by failing to copy the values. In more > recent versions of Hadoop (we use 0.20.1) is the behaviour of the reduce > drivers altered to match that of actual running reducers in this respect? > Are there any plans to do this? Alternatively, I'd be willing to fix this in > the Hadoop codebase myself if necessary. > > Regards, > > James > > -- > James Hammerton | Senior Data Mining Engineer > www.mendeley.com/profiles/james-hammerton > > Mendeley Limited | London, UK | www.mendeley.com > Registered in England and Wales | Company Number 6419015 > > > >
