[ https://issues.apache.org/jira/browse/ORC-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt McCline updated ORC-209: ----------------------------- Summary: Add Decimal64 Serialization/Deserialization (Part 1) (was: Add Decimal64 Serialization/Deserialization) > Add Decimal64 Serialization/Deserialization (Part 1) > ---------------------------------------------------- > > Key: ORC-209 > URL: https://issues.apache.org/jira/browse/ORC-209 > Project: ORC > Issue Type: Bug > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Fix For: 2.0.0 > > Attachments: ORC-209.01.wip.patch, ORC-209.02.wip.patch, > ORC-209.03.patch, storage-api.01.wip.patch, storage-api.02.wip.patch > > > Currently, HiveDecimal is serialized in ORC in a special binary bytes format > as the "value" stream and a secondary stream with the scale for each decimal. > The decimal has trailing zeroes removed and the scale can vary for each > decimal. This format has CPU and storage space (i.e. compression) > inefficiencies. > The decimal type has a fixed precision and scale. Gopal/Prasanth/Owen have > suggested storing the decimals with the trailing zeroes (so the scale is a > constant value for the file from the metadata) and store it as an integer > stream that can benefit from run-length encoding compression, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)