[ https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806472#action_12806472 ]
Scott Carey commented on MAPREDUCE-1126: ---------------------------------------- b...@scott: the annotations for Input/OutputFormat seem to be misplaced. It seems desirable to be able to write a single Map function that does wordcount on Strings, regardless of whether those strings are stored in newline-delimited text, sequence files, avro data files, or whatever. Philip, yes, they are not in the right place. I just wanted to bring into the conversation that 'SomeObject.setSomeBinding()' is not the only way to do these sort of things. Annotations, unlike setter methods, can be moved around and adapted to work in various ways without breaking APIs. For example, the Input/OutputFormat annotation could go on either a Map class, OR some other more specific annotation site, and with defaults and priority (set on configuration > annotated on configuration > annotated on map > default) determining which applies. After thinking about it a bit more, and doing some research into how other APIs do some tricky things with Annotations, there are a few things to consider. * It is possible in some situations to infer the generic types of a class at runtime by constructing an instance of an object with the same type arguments. Example: http://wiki.fasterxml.com/JacksonInFiveMinutes#Data_Binding_with_Generics. * Annotations on class A can be applied to class B "Mix-In Annotations"; http://wiki.fasterxml.com/JacksonMixInAnnotations * Post-compile time checks via an annotation processor can validate code before run time in cases where the current M/R framework only breaks at run time. What I think is most important to this discussion is that some layers of configuration complexity can be hidden from users, and some of it deferred to the future. The 'site' of the configuration can be moved around with Annotations, opening up ways to simplify the steps required to do declarative configuration. With this in mind, some additional complexity to the procedural configuration methods is more acceptable if there are good defaults and a later (backwards compatible) API addition simplifies things. Likewise, some elements of complexity can be skipped for now if it can be seen that those could be available through a configuration extension later. Perhaps the procedural API would never allow configuring a key and value to use different serializers to avoid API complexity, but an annotation extension in the future allows that. > shuffle should use serialization to get comparator > -------------------------------------------------- > > Key: MAPREDUCE-1126 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task > Reporter: Doug Cutting > Assignee: Aaron Kimball > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, > MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, > MAPREDUCE-1126.patch, MAPREDUCE-1126.patch > > > Currently the key comparator is defined as a Java class. Instead we should > use the Serialization API to create key comparators. This would permit, > e.g., Avro-based comparators to be used, permitting efficient sorting of > complex data types without having to write a RawComparator in Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.