[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

Ted Dunning (JIRA) Tue, 26 Jan 2010 14:04:57 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805233#action_12805233
 ]


Ted Dunning commented on MAPREDUCE-1126:
----------------------------------------


Isn't there a middle ground available (at least from the user's point of view)?

My thought would be that if the user specifies types in the current style, they 
would be limited to Writables in the current fashion.  That could be marked as 
old-fashioned, but I wouldn't necessarily deprecate it.  It does leave Writable 
in a privileged position relative to other serialization frameworks, but it 
*is* in a privileged position since it existed first.

Alternately, the user could specify a serialization framework specific 
configuration much like Doug suggests.  It should be true that if any 
non-standard serialization is used that specifying a type is an error and vice 
versa.  This should be easy to check.

>From the user's point of view, they could use old-style job configuration or 
>the new style that Doug suggests.  I strongly prefer the new style, but I 
>wouldn't be anxious to have to change all my old style programs.

Under the covers, almost anything could happen, but the important thing that 
would happen is that if any special serialization is invoked, the job config 
would need to know about it which might affect many other components like the 
shuffle.

Is there any technical reason why this cannot be made to work?

Is there really any philosophical reason that old programs must be broken?

If no and no, why is there a problem here?  I think that this middle ground 
would satisfy Owen's (and my own) needs for backwards compatibility as well as 
Doug's (and my own) desire for flexibility for serialization.



> shuffle should use serialization to get comparator
> --------------------------------------------------
>
>                 Key: MAPREDUCE-1126
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>            Reporter: Doug Cutting
>            Assignee: Aaron Kimball
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, 
> MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, 
> MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

Reply via email to