This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git
The following commit(s) were added to refs/heads/master by this push:
new 38818fa MINOR: Add summary table of encodings and supported types (in
Encodings.md) (#550) (#552)
38818fa is described below
commit 38818fa0e7efd54b535001a4448030a40619c2a3
Author: Naohiro Kakimura <[email protected]>
AuthorDate: Mon Feb 9 23:49:09 2026 +0900
MINOR: Add summary table of encodings and supported types (in Encodings.md)
(#550) (#552)
* MINOR: Add summary table of encodings and supported types (in
Encodings.md) (#550)
* MINOR: Add summary table of encodings and supported types (in
Encodings.md) (#550);
* MINOR: Add summary table of encodings and supported types (in
Encodings.md) (#550) - remove v1 related column, and seperate tables for
supported and deprecated encodings
* Update Encodings.md
Add Dictionary indices to encoding targets
Co-authored-by: Andrew Lamb <[email protected]>
* Update Encodings.md
fix typo
Co-authored-by: Gang Wu <[email protected]>
* added note/link to the implementation status page
---------
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Gang Wu <[email protected]>
---
Encodings.md | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/Encodings.md b/Encodings.md
index e620e9a..1c766fb 100644
--- a/Encodings.md
+++ b/Encodings.md
@@ -25,6 +25,27 @@ This file contains the specification of all supported
encodings.
Unless otherwise stated in page or encoding documentation, any encoding can be
used with any page type.
+### Supported Encodings
+
+For details on current implementation status, see the [Implementation
Status](https://parquet.apache.org/docs/file-format/implementationstatus/#encodings)
page.
+
+| Encoding type | Encoding enum
| Supported Types
|
+| ------------------------------------------------ |
--------------------------------------------------------- |
------------------------------------------------- |
+| [Plain](#PLAIN) | PLAIN = 0
| All Physical Types
|
+| [Dictionary Encoding](#DICTIONARY) | PLAIN_DICTIONARY = 2
(Deprecated) <br> RLE_DICTIONARY = 8 | All Physical Types
|
+| [Run Length Encoding / Bit-Packing Hybrid](#RLE) | RLE = 3
| BOOLEAN, Dictionary Indices
|
+| [Delta Encoding](#DELTAENC) | DELTA_BINARY_PACKED = 5
| INT32, INT64
|
+| [Delta-length byte array](#DELTALENGTH) | DELTA_LENGTH_BYTE_ARRAY =
6 | BYTE_ARRAY
|
+| [Delta Strings](#DELTASTRING) | DELTA_BYTE_ARRAY = 7
| BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY
|
+| [Byte Stream Split](#BYTESTREAMSPLIT) | BYTE_STREAM_SPLIT = 9
| INT32, INT64, FLOAT, DOUBLE,
FIXED_LEN_BYTE_ARRAY |
+
+### Deprecated Encodings
+
+| Encoding type | Encoding enum |
+| ------------------------------------- | -------------- |
+| [Bit-packed (Deprecated)](#BITPACKED) | BIT_PACKED = 4 |
+
+
<a name="PLAIN"></a>
### Plain: (PLAIN = 0)
@@ -50,6 +71,7 @@ For native types, this outputs the data as little endian.
Floating
For the byte array type, it encodes the length as a 4 byte little
endian, followed by the bytes.
+<a name="DICTIONARY"></a>
### Dictionary Encoding (PLAIN_DICTIONARY = 2 and RLE_DICTIONARY = 8)
The dictionary encoding builds a dictionary of values encountered in a given
column. The
dictionary will be stored in a dictionary page per column chunk. The values
are stored as integers
@@ -295,6 +317,7 @@ The encoded data is
This encoding is similar to the [RLE/bit-packing](#RLE) encoding. However the
[RLE/bit-packing](#RLE) encoding is specifically used when the range of ints is
small over the entire page, as is true of repetition and definition levels. It
uses a single bit width for the whole page.
The delta encoding algorithm described above stores a bit width per miniblock
and is less sensitive to variations in the size of encoded integers. It is also
somewhat doing RLE encoding as a block containing all the same values will be
bit packed to a zero bit width thus being only a header.
+<a name="DELTALENGTH"></a>
### Delta-length byte array: (DELTA_LENGTH_BYTE_ARRAY = 6)
Supported Types: BYTE_ARRAY
@@ -317,6 +340,7 @@ then the encoded data would be comprised of the following
segments:
- DeltaEncoding(5, 5, 6, 6) (the string lengths)
- "HelloWorldFoobarABCDEF"
+<a name="DELTASTRING"></a>
### Delta Strings: (DELTA_BYTE_ARRAY = 7)
Supported Types: BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY
@@ -338,6 +362,7 @@ then the encoded data would be comprised of the following
segments:
Note that, even for FIXED_LEN_BYTE_ARRAY, all lengths are encoded despite the
redundancy.
+<a name="BYTESTREAMSPLIT"></a>
### Byte Stream Split: (BYTE_STREAM_SPLIT = 9)
Supported Types: FLOAT, DOUBLE, INT32, INT64, FIXED_LEN_BYTE_ARRAY