[
https://issues.apache.org/jira/browse/HIVE-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723872#action_12723872
]
Zheng Shao commented on HIVE-553:
---------------------------------
4. Binary sortable: The byte order of the serialized byte array should be the
same as the semantic order of the deserialized objects (Thanks Namit)
4 may conflict with 1 and 2, and we might eventually add another
LazyBinaryCompactSerDe. But for now, we should add 4 so that we can also use it
to replace DynamicSerDe with binary sortable protocol.
Let's change the name of this to LazyBinarySortableSerDe then.
> Add LazyBinarySerDe to Hive
> ---------------------------
>
> Key: HIVE-553
> URL: https://issues.apache.org/jira/browse/HIVE-553
> Project: Hadoop Hive
> Issue Type: New Feature
> Affects Versions: 0.4.0
> Reporter: Zheng Shao
>
> Currently the most popular SerDe in Hive is LazySimpleSerDe. LazySimpleSerDe
> has the benefit of being simple (use text format to store data), but its
> performance may suffer in the following cases:
> 1. For double values, we are storing them in text format which is very
> space-inefficient, and both serialization and deserialization are slow;
> 2. For complex type of columns that contains a lot of levels, we are scanning
> the buffer once per level, which is very inefficient.
> We should add a binary serde format that stores the data in binary format.
> The format should have the following properties:
> 1. Compact: it should be space-efficient;
> 2. Fast: it should be efficiently to deserialize the data, especially for
> double values and complex types.
> 3. It should support serializing NULL values.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.