nkaki opened a new issue, #550:
URL: https://github.com/apache/parquet-format/issues/550
### Describe the enhancement requested
I propose adding an overview table to Encodings.md that lists each encoding,
the supported data types and targets, and the applicable Parquet format
versions. The goal is to provide a quick reference near the top of the page
while keeping the detailed explanations further down.
Currently, encodings are explained one by one. While the detailed
descriptions are useful, I believe a concise table will help users quickly
identify which encodings are relevant for a given data type and format version.
The table will be placed before the explanation of the Plain encoding. It
will highlight version-specific differences in supported types (e.g., which
encodings and types are supported only in later format versions).
I have ensured that the terminology matches the existing documentation and
verified the version information against past Thrift files and documentation.
If you have any comments or suggestions, please let me know — I’m happy to
update the proposal. I can also open a PR with the table if the maintainers are
comfortable with this approach. This is my first contribution here, so please
let me know if there are any preferred formats or processes I should follow.
Example of the table:
### Overview
#### Parquet Format 1.0.0+
| Encoding type | Encoding enum
| Encoding targets |
| ---------------------------------------- |
--------------------------------- | -------------------------------- |
| Plain | PLAIN = 0
| All Physical Types |
| Dictionary Encoding | PLAIN_DICTIONARY = 2
| All Physical Types |
| Run Length Encoding / Bit-Packing Hybrid | RLE = 3
| definition and repetition levels |
| Bit-packed | BIT_PACKED = 4
| definition and repetition levels |
#### Parquet Format 2.0.0+
| Encoding type | Encoding enum
| Encoding Targets
|
| ---------------------------------------- |
--------------------------------- |
-------------------------------------------------------------------------------------------------
|
| Plain | PLAIN = 0
| All Physical Types, dictionary entries in dictionary page
|
| Dictionary Encoding (Plain) | PLAIN_DICTIONARY = 2
(Deprecated) | All Physical Types
|
| Run Length Encoding / Bit-Packing Hybrid | RLE = 3
| BOOLEAN
|
| Bit-packed (Deprecated) | BIT_PACKED = 4
| N/A
|
| Delta Encoding | DELTA_BINARY_PACKED = 5
| INT32, INT64
|
| Delta-length byte array | DELTA_LENGTH_BYTE_ARRAY = 6
| BYTE_ARRAY
|
| Delta Strings | DELTA_BYTE_ARRAY = 7
| BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY
|
| Dictionary Encoding (RLE) | RLE_DICTIONARY = 8
| All Physical Types, repetition/definition levels, dictionary indices in
data pages |
| Byte Stream Split (2.8.0+) | BYTE_STREAM_SPLIT = 9
| FLOAT (2.8.0+), DOUBLE (2.8.0+), INT32 (2.11.0+), INT64 (2.11.0+),
FIXED_LEN_BYTE_ARRAY (2.11.0+) |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]