[ 
https://issues.apache.org/jira/browse/HIVE-43?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-43:
------------------------------

    Component/s: Serializers/Deserializers

> [Hive] Port Hive's serialization/deserialization to the new Serialization 
> framework
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-43
>                 URL: https://issues.apache.org/jira/browse/HIVE-43
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Pete Wyckoff
>
> Problem 1: legacy data
> This is non-trivial because of legacy Hive data which is written as 
> BytesWritable in the SequenceFile value key.  The specific RecordIO/Thrift/X 
> class name is stored in the metastore. 
> If we write our own SequenceFileRecordReader, this is trivial, but the 
> standard reader assumes the SequenceFile has the actual class name and thus 
> we cannot  deserialize at this level as we would just get back bytes 
> writable. We need the SequenceFileRecordReader to consult the Deserializer as 
> to what the actual class being deserialized is.
> I don't know if this is a common problem of writing data as just 
> byteswritable and storing the real class somewhere else, but for us it is an 
> issue.
> Otherwise, there's soon to be a ThriftSerialization set of classes and we can 
> add ones for our other serdes.
> Problem 2: DynamicSerDe
> This is a serializer/deserializer that takes a thrift DDL at *runtime* and 
> can serialize/deserialize thrift/non thrift data.  Thus, the class name 
> DynamicSerDe doesn't give you what you need, namely the DDL and the protocol 
> used for the serialization - Binary or Control Separated. (in theory json, 
> xml, ...)  
> We can store this DDL in the metastore (and we do), but then DynamicSerDe 
> must be used only with Hive.  Maybe we should output only to TFiles where we 
> could put the DDL in the metadata for the file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to