[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

Chris Douglas (JIRA) Thu, 28 Jan 2010 14:39:59 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806136#action_12806136
 ]


Chris Douglas commented on MAPREDUCE-1126:
------------------------------------------

bq. "1) throwing away all Java type hierarchies". Only sometimes, no? This is 
only in the case where you explicitly want to do unions (and Java's union 
support is either Object, type hierarchies, or wrappers). In the typical case, 
your map functions on SomeSpecificRecordType, outputs 
SomeSpecificMapOutputKey/ValueType, and so forth. You still get type safety in 
many of the recommended use cases.

Sure, but this doesn't cite the negative side. The patch changes the collection 
from a 1:1 class match- more restricted than the Java types- to a model 
unrelated to the declared types. So if a job accepts {{Short}}, {{Integer}}, 
and {{Long}} it may declare its type as {{Numeric}} but- again, depending on 
the serialization details- may reject (or fail to reject) {{Double}} and 
{{Float}}. So instead of being forced to declare a union type, whether this is 
reasonable is decided between the serialization and the user. This is a 
contrived example, but one can easily imagine other scenarios where an accepted 
subset of the supertypes are not only unenforced by the framework, but 
unenforceable. The even more interesting case is when one has a type hierarchy 
the serializer cares about that isn't expressed in Java types (e.g., valid keys 
contain a field named "dingo" whose supertype is Yak). The proposed model 
doesn't make it impossible to write type-checked code, but it does make it 
easier to write code that isn't (which, again: great for frameworks, arguably 
not as good for users). As I said earlier, it's a powerful, but dramatic shift 
from the current model that should be carefully considered.

bq. what happens if my serialization code is not written in Java and I have to 
use JNI to get to it?

I don't think any model yet proposed would make this harder than it is today.

> shuffle should use serialization to get comparator
> --------------------------------------------------
>
>                 Key: MAPREDUCE-1126
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>            Reporter: Doug Cutting
>            Assignee: Aaron Kimball
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch, MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

Reply via email to