[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803975#action_12803975
 ] 

Tom White commented on MAPREDUCE-1126:
--------------------------------------

This patch is making changes that make it possible to take advantage of the 
more general serialization mechanism introduced in HADOOP-6165 in MapReduce. 
Until now the serialization used for a key or value was driven by the type of 
the key or value. This is not sufficiently general for Avro, which is what 
motivated the work in HADOOP-6165. (Note that HADOOP-6165 did not have any 
effect on user APIs, since users don't typically interact with serialization 
classes directly.) However, it is true that many serialization frameworks *are* 
type driven - Writables, Thrift, Java Serialization, Avro Specific, to name a 
few - so I think there may be an argument to retain job.setMapOutputKeyClass() 
as it currently stands. The advantage is that existing Writable-based jobs do 
not have to be changed, which I think is at the heart of Owen's criticism.

For Avro Generic, or serializations where the schema for the types needs to be 
specified, we can use the AvroGenericJobData class in this patch. (BTW Aaron, 
why does SchemaBasedJobData exist? It seems to reference Avro internally, even 
though it names suggests it is general.) In this case, there would be no need 
to call job.setMapOutputKeyClass().

Would this address folks' concerns?


> shuffle should use serialization to get comparator
> --------------------------------------------------
>
>                 Key: MAPREDUCE-1126
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>            Reporter: Doug Cutting
>            Assignee: Aaron Kimball
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to