[jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration

Owen O'Malley (JIRA) Thu, 18 Nov 2010 09:02:46 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933475#action_12933475
 ]


Owen O'Malley commented on HADOOP-6685:
---------------------------------------

Tom, I understood your point. You aren't seeing mine.

There are three use cases (using Thrift as the library in question):
* _The job is not using Thrift._ In this case it doesn't matter whether the 
thrift jar and the plugin class are on the classpath.
* _The job is using the same version of Thrift_ The two solutions look like:
** If the plugin is built into Hadoop, the user just uses their class.
** If the plugin is a separate jar, the application must copy both the thrift 
jar and the thrift plugin into HDFS and put them in their distributed cache. 
They also need to put them on them on the task's (and the launching node's) 
classpath.
* _The job is using a different version of Thrift._ This is the 1% case. The 
two solutions look like:
** If the plugin is built into Hadoop, the application must put the thrift jar 
into the distributed cache.
** If the plugin is separate, the application must put the thrift jar and the 
plugin jar into the distributed cache.

Also note that using the distributed cache, it is easy for user mistakes to end 
up having a copy of thrift and plugin per a job or per a user on all of the 
slave nodes.

In summary, the user can always override the version distributed with Hadoop. 
The question is just how convenient we can make the standard use cases. We have 
added many dependencies over the years and they've never provoked this kind of 
objection.

> Change the generic serialization framework API to use serialization-specific 
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, 
> serial6.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for 
> the serialization specific configuration. Since this data is really internal 
> to the specific serialization, I think we should change it to be an opaque 
> binary blob. This will simplify the interface for defining specific 
> serializations for different contexts (MAPREDUCE-1462). It will also move us 
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration

Reply via email to