Re: Decimal binary encoding

Ryan Blue Wed, 16 Nov 2016 14:59:53 -0800

The intent was for binary to store the minimum number of bytes for each
unscaled value. Fixed should be used if you want to store all values with
the same number of bytes because that avoids writing a length for each byte
array. Binary works well for the case you described, where you have a large
precision, but enough small values to offset the cost of storing the length.


rb

On Wed, Nov 16, 2016 at 2:41 PM, Henry Robinson <[email protected]> wrote:

> Hi -
>
> I'm adding binary encoding support for decimal to Impala, and have one
> question about some wording in the spec:
>
> "binary: precision is not limited, but is required. The minimum number of
> bytes to store the unscaled value should be used"
>
> https://github.com/apache/parquet-format/blob/master/
> LogicalTypes.md#decimal
>
> When the spec says 'the minimum number of bytes', which of the following
> does that mean:
>
> * the minimum number of bytes to store a particular unscaled value must be
> used (so for '8' it's one byte, for '550' it's two bytes and so on), and
> the encoded length is value dependent.
>
> or
>
> * the minimum number of bytes for the given precision must be used (so all
> values in a given column should have the same byte length).
>
> If it's the latter, the implementation is much easier because
> FIXED_LEN_BYTE_ARRAY becomes a special case of BINARY, but the former
> offers more opportunity for compact representations on a high precision
> column that in practice has low precision values.
>
> Thanks,
> Henry
>



-- 
Ryan Blue
Software Engineer
Netflix

Re: Decimal binary encoding

Reply via email to