[
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933475#action_12933475
]
Owen O'Malley commented on HADOOP-6685:
---------------------------------------
Tom, I understood your point. You aren't seeing mine.
There are three use cases (using Thrift as the library in question):
* _The job is not using Thrift._ In this case it doesn't matter whether the
thrift jar and the plugin class are on the classpath.
* _The job is using the same version of Thrift_ The two solutions look like:
** If the plugin is built into Hadoop, the user just uses their class.
** If the plugin is a separate jar, the application must copy both the thrift
jar and the thrift plugin into HDFS and put them in their distributed cache.
They also need to put them on them on the task's (and the launching node's)
classpath.
* _The job is using a different version of Thrift._ This is the 1% case. The
two solutions look like:
** If the plugin is built into Hadoop, the application must put the thrift jar
into the distributed cache.
** If the plugin is separate, the application must put the thrift jar and the
plugin jar into the distributed cache.
Also note that using the distributed cache, it is easy for user mistakes to end
up having a copy of thrift and plugin per a job or per a user on all of the
slave nodes.
In summary, the user can always override the version distributed with Hadoop.
The question is just how convenient we can make the standard use cases. We have
added many dependencies over the years and they've never provoked this kind of
objection.
> Change the generic serialization framework API to use serialization-specific
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-6685
> URL: https://issues.apache.org/jira/browse/HADOOP-6685
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.22.0
>
> Attachments: libthrift.jar, serial.patch, serial4.patch,
> serial6.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for
> the serialization specific configuration. Since this data is really internal
> to the specific serialization, I think we should change it to be an opaque
> binary blob. This will simplify the interface for defining specific
> serializations for different contexts (MAPREDUCE-1462). It will also move us
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.