[ https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806136#action_12806136 ]
Chris Douglas commented on MAPREDUCE-1126: ------------------------------------------ bq. "1) throwing away all Java type hierarchies". Only sometimes, no? This is only in the case where you explicitly want to do unions (and Java's union support is either Object, type hierarchies, or wrappers). In the typical case, your map functions on SomeSpecificRecordType, outputs SomeSpecificMapOutputKey/ValueType, and so forth. You still get type safety in many of the recommended use cases. Sure, but this doesn't cite the negative side. The patch changes the collection from a 1:1 class match- more restricted than the Java types- to a model unrelated to the declared types. So if a job accepts {{Short}}, {{Integer}}, and {{Long}} it may declare its type as {{Numeric}} but- again, depending on the serialization details- may reject (or fail to reject) {{Double}} and {{Float}}. So instead of being forced to declare a union type, whether this is reasonable is decided between the serialization and the user. This is a contrived example, but one can easily imagine other scenarios where an accepted subset of the supertypes are not only unenforced by the framework, but unenforceable. The even more interesting case is when one has a type hierarchy the serializer cares about that isn't expressed in Java types (e.g., valid keys contain a field named "dingo" whose supertype is Yak). The proposed model doesn't make it impossible to write type-checked code, but it does make it easier to write code that isn't (which, again: great for frameworks, arguably not as good for users). As I said earlier, it's a powerful, but dramatic shift from the current model that should be carefully considered. bq. what happens if my serialization code is not written in Java and I have to use JNI to get to it? I don't think any model yet proposed would make this harder than it is today. > shuffle should use serialization to get comparator > -------------------------------------------------- > > Key: MAPREDUCE-1126 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task > Reporter: Doug Cutting > Assignee: Aaron Kimball > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, > MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, > MAPREDUCE-1126.patch, MAPREDUCE-1126.patch > > > Currently the key comparator is defined as a Java class. Instead we should > use the Serialization API to create key comparators. This would permit, > e.g., Avro-based comparators to be used, permitting efficient sorting of > complex data types without having to write a RawComparator in Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.