Matt McCline created ORC-209:
--------------------------------
Summary: Improve Decimal Serialization/Deserialization
Key: ORC-209
URL: https://issues.apache.org/jira/browse/ORC-209
Project: ORC
Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
Currently, HiveDecimal is serialized in ORC in a special binary bytes format as
the "value" stream and a secondary stream with the scale for each decimal. The
decimal has trailing zeroes removed and the scale can vary for each decimal.
This format has CPU and storage space (i.e. compression) inefficiencies.
The decimal type has a fixed precision and scale. Gopal/Prasanth/Owen have
suggested storing the decimals with the trailing zeroes (so the scale is a
constant value for the file from the metadata) and store it as an integer
stream that can benefit from run-length encoding compression, etc.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)