Decimal binary encoding

Henry Robinson Wed, 16 Nov 2016 14:42:58 -0800

Hi -

I'm adding binary encoding support for decimal to Impala, and have one
question about some wording in the spec:


"binary: precision is not limited, but is required. The minimum number of
bytes to store the unscaled value should be used"

https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

When the spec says 'the minimum number of bytes', which of the following
does that mean:

* the minimum number of bytes to store a particular unscaled value must be
used (so for '8' it's one byte, for '550' it's two bytes and so on), and
the encoded length is value dependent.

or

* the minimum number of bytes for the given precision must be used (so all
values in a given column should have the same byte length).

If it's the latter, the implementation is much easier because
FIXED_LEN_BYTE_ARRAY becomes a special case of BINARY, but the former
offers more opportunity for compact representations on a high precision
column that in practice has low precision values.

Thanks,
Henry

Decimal binary encoding

Reply via email to