er, +CC mapreduce-dev... - A
On Wed, Jul 22, 2009 at 8:17 PM, Aaron Kimball<[email protected]> wrote: > +CC mapred-dev > > Hm.. Making this change is actually really difficult. > > After changing Mapper.java, I understand why this was made a > non-static member. By making Context non-static, it can inherit from > MapContext<W, X, Y, Z> and bind to the type qualifiers already > specified in the class definition. So you can't make Context static > because it needs to inherit the type parameters from the Mapper class. > (Since Mapper has type qualifiers, it actually can't have static inner > classes anyway.) But why even have Context in there anyway, given that > it does nothing beyond MapContext's implementation? > > We can still change the map method's signature to use MapContext. > > As it is, you have to write: > class MyMapper extends Mapper<W, X, Y, Z> { > void map(W k, X v, Context c) { > .... > } > } > > To make the change, you would now need to write: > > class MyMapper extends Mapper<W, X, Y, Z> { > void map(W k, X v, MapContext<W,X,Y,Z> c) { > .... > } > } > > So I think the primary reason for the non-static inner member is to > save you some typing. That having been said, it makes adding mock > implementations of Context really difficult. > > The real reason this is a tricky change, though, is that this is > (surprisingly, to me) an incompatible change -- even though Context > subclasses MapContext, so it's a type-safe API change to widen the set > of inputs we accept in map(), anyone who specified @Override on the > method and uses Context as an argument will get a compiler error :\ > (@Override, it seems, uses the literal names even if there's a > type-safe promotion to a method in a superclass.) > > Can we still make incompatible API changes on the new API, or is it > officially frozen? If incompatible changes are allowed, I'd like to > see this in. I think that in the interest of better > mockability/extensibility, it'd be cleaner to ditch the inner Context > class in favor of explicit use of MapContext. We know Java's type > system is verbose, but that doesn't mean we should try to hack around > it, if that means losing functionality. (I think I know how to get > around this in MRUnit, but it'd be cleaner to not have to.) > > - Aaron > > > > On Wed, Jul 22, 2009 at 7:22 PM, Aaron Kimball<[email protected]> wrote: >> Both of those are good points. I'll submit a patch. >> - Aaron >> >> On Wed, Jul 22, 2009 at 6:24 PM, Ted Dunning<[email protected]> wrote: >>> To amplify David's point, why is the argument a Mapper.Context rather than >>> MapContext? >>> >>> Also, why is the Mapper.Context not static? >>> >>> On Wed, Jul 22, 2009 at 5:29 PM, David Hall <[email protected]> wrote: >>> >>>> This is nice, but doesn't it suffer from the same problem? MRUnit uses >>>> the mapred API, which is deprecated, and the new API doesn't use >>>> OutputCollector, but a non-static inner class. >>>> >>>> -- David >>>> >>>> On Wed, Jul 22, 2009 at 4:52 PM, Aaron Kimball<[email protected]> wrote: >>>> > Hi David, >>>> > >>>> > I wrote a contrib module called MRUnit >>>> > (http://issues.apache.org/jira/browse/hadoop-5518) designed to allow >>>> > unit tests for mappers/reducers more easily. It's slated for inclusion >>>> > in 0.21, not 0.20 unfortunately, but you can download the patch above >>>> > as well as MAPREDUCE-680 and build it against any earlier version of >>>> > Hadoop. Unfortunately, it doesn't currently support the new APIs >>>> > (e.g., with Context objects), but I imagine this could be added with >>>> > little difficulty. I just haven't had time to do it myself ;) If you'd >>>> > like to take a stab at it, I'd love some help! >>>> > >>>> > More info is at www.cloudera.com/hadoop-mrunit >>>> > Cheers, >>>> > - Aaron >>>> > >>>> > On Wed, Jul 22, 2009 at 2:49 PM, David Hall<[email protected]> wrote: >>>> >> Hi, >>>> >> >>>> >> I'm a student working with Apache Mahout for the Google Summer of >>>> >> Code. We recently moved to 0.20.0, and I was porting my code to the >>>> >> new API. Unfortunately, I (and the whole project team) seem to have >>>> >> run into a problem when it comes to testing them. >>>> >> >>>> >> Historically, we would create a Mapper in a unit test, and a special >>>> >> "DummyOutputCollector", which was essentially a multimap dressed up to >>>> >> conform to OutputCollector. In Hadoop 0.20.0, this isn't possible >>>> >> anymore, because Mappers take an instance of an inner class. >>>> >> >>>> >> It's of course possible to dress up the Context in something else >>>> >> (say, something just like an OutputCollector), and to specify that >>>> >> Mahout Mappers should just delegate to a method that takes an >>>> >> OutputCollector. But, this seems to not be very idiomatic. >>>> >> >>>> >> All this goes to say, what would be a "best practice" for testing >>>> >> Mappers and Reducers in 0.20.0? >>>> >> >>>> >> Thanks, >>>> >> David Hall >>>> >> >>>> > >>>> >>> >>> >>> >>> -- >>> Ted Dunning, CTO >>> DeepDyve >>> >> >
