[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805618#action_12805618
 ] 

Alan Gates commented on MAPREDUCE-1126:
---------------------------------------

bq. [From Owen] My assertion is that leaving the type as the primary instrument 
of the user in defining the job is correct. I haven't talked to any users that 
care about using a non-default serializer for a given type.

Pig would like to.  For scalar types Pig uses Java String, Long, Integer, etc.  
But default Java serialization is slow.  So currently we convert these to and 
from Writables as we go across the Map and Reduce boundaries to get the faster 
Writable serialization.  If we could instead define an alternate serializer and 
avoid these conversions it would make our code simpler and should perform 
better.

> shuffle should use serialization to get comparator
> --------------------------------------------------
>
>                 Key: MAPREDUCE-1126
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>            Reporter: Doug Cutting
>            Assignee: Aaron Kimball
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to