Gopal V commented on ORC-209:

I'll submit a PR for the encodings table and document the decimal64 

> Add Decimal64 Serialization/Deserialization (Part 1)
> ----------------------------------------------------
>                 Key: ORC-209
>                 URL: https://issues.apache.org/jira/browse/ORC-209
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>             Fix For: 2.0.0
>         Attachments: ORC-209.01.wip.patch, ORC-209.02.wip.patch, 
> ORC-209.03.patch, storage-api.01.wip.patch, storage-api.02.wip.patch
> Currently, HiveDecimal is serialized in ORC in a special binary bytes format 
> as the "value" stream and a secondary stream with the scale for each decimal. 
>  The decimal has trailing zeroes removed and the scale can vary for each 
> decimal.  This format has CPU and storage space (i.e. compression) 
> inefficiencies.
> The decimal type has a fixed precision and scale.  Gopal/Prasanth/Owen have 
> suggested storing the decimals with the trailing zeroes (so the scale is a 
> constant value for the file from the metadata) and store it as an integer 
> stream that can benefit from run-length encoding compression, etc.

This message was sent by Atlassian JIRA

Reply via email to