Fernando Pereira created PARQUET-845:
----------------------------------------
Summary: Efficient storage for several INT_8 and INT_16
Key: PARQUET-845
URL: https://issues.apache.org/jira/browse/PARQUET-845
Project: Parquet
Issue Type: Wish
Reporter: Fernando Pereira
Priority: Minor
In very large datasets, aggregating several INT8 into INT32 fields (or byte
array) can make a big difference.
In parquet, efficient algorithms exist for INT32, so if the LogicalType is
INT_8 the encoded int might take up only one byte.
However further optimizations could be made by allowing the user to better
specify the types.
What about BYTE_ARRAY logical type, backed by FIXED_LEN_BYTE_ARRAY type (or
eventually INT_32)?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)