[
https://issues.apache.org/jira/browse/HADOOP-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628479#action_12628479
]
Pete Wyckoff commented on HADOOP-3787:
--------------------------------------
This looks like a good addition. But, there's another use case where one does
not know a-priori the thrift class (or I would imagine the same problem for
recordio) that should be used for deserializing/serializing. (let's not assume
sequence files and/or what to do about legacy data). This is what the hive
case looks like. We may even have a thrift serde that takes its DDL at runtime.
Registering all these doesn't seem very scalable. And even for sequence files,
a serializer that needs the DDL at runtime wouldn't work.
It seems one needs some kind of metainformation beyond the key or value classes
that can be stored "somewhere" and then be used to instantiate the
serializer/deserializer to make this use case work. Otherwise, one is stuck
using BytesWritable and then having the application logic figure out how to
instantiate the real serializer/deserializer. Somewhat more like what was
proposed in: https://issues.apache.org/jira/browse/HADOOP-2429
> Add serialization for Thrift
> ----------------------------
>
> Key: HADOOP-3787
> URL: https://issues.apache.org/jira/browse/HADOOP-3787
> Project: Hadoop Core
> Issue Type: Wish
> Components: examples, mapred
> Reporter: Tom White
> Attachments: hadoop-3787.patch, libthrift.jar
>
>
> Thrift (http://incubator.apache.org/thrift/) is cross-language serialization
> and RPC framework. This issue is to write a ThriftSerialization to support
> using Thrift types in MapReduce programs, including an example program. This
> should probably go into contrib.
> (There is a prototype implementation in
> https://issues.apache.org/jira/secure/attachment/12370464/hadoop-serializer-v2.tar.gz)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.