[
https://issues.apache.org/jira/browse/HADOOP-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773152#action_12773152
]
Hong Tang commented on HADOOP-4243:
-----------------------------------
TFile does not store objects and is oblivious to Serialization framework.
> Serialization framework use SequenceFile/TFile/Other metadata to instantiate
> deserializer
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-4243
> URL: https://issues.apache.org/jira/browse/HADOOP-4243
> Project: Hadoop Common
> Issue Type: Improvement
> Components: contrib/serialization
> Reporter: Pete Wyckoff
>
> SequenceFile metadata is useful for storing additional information about the
> serialized data, for example, for RecordIO, whether the data is CSV or
> Binary. For thrift, the same thing - Binary, JSON, ...
> For Hive, this may be especially important, because it has a Dynamic generic
> serializer/deserializer that takes its DDL at runtime (as opposed to RecordIO
> and Thrift which require pre-compilation into a specific class whose name can
> be stored in the sequence file key or value class). In this case, the class
> name is like Record.java in RecordIO - it doesn't tell you anything without
> the DDL.
> One way to address this could be adding the sequence file metadata to the
> getDeserializer call in Serialization interface. The api would then be
> something like getDeserializer(Class<?>, Map<Text, Text> metadata) or
> Properties metadata.
> But, I am open to proposals.
> This also means that saying a class implements Writable is not enough to
> necessarily deserialize it since it may do specific actions based on the
> metadata - e.g., RecordIO might determine whether to use CSV rather than the
> default Binary deserialization.
> There's the other issue of the getSerializer returning the metadata to be
> written to the Sequence/T File.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.