(parquet-format) branch master updated: MINOR: Add summary table of encodings and supported types (in Encodings.md) (#550) (#552)

alamb Mon, 09 Feb 2026 06:49:26 -0800

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git



The following commit(s) were added to refs/heads/master by this push:
     new 38818fa  MINOR: Add summary table of encodings and supported types (in 
Encodings.md) (#550) (#552)
38818fa is described below

commit 38818fa0e7efd54b535001a4448030a40619c2a3
Author: Naohiro Kakimura <[email protected]>
AuthorDate: Mon Feb 9 23:49:09 2026 +0900

    MINOR: Add summary table of encodings and supported types (in Encodings.md) 
(#550) (#552)
    
    * MINOR: Add summary table of encodings and supported types (in 
Encodings.md) (#550)
    
    * MINOR: Add summary table of encodings and supported types (in 
Encodings.md) (#550);
    
    * MINOR: Add summary table of encodings and supported types (in 
Encodings.md) (#550) - remove v1 related column, and seperate tables for 
supported and deprecated encodings
    
    * Update Encodings.md
    
    Add Dictionary indices to encoding targets
    
    Co-authored-by: Andrew Lamb <[email protected]>
    
    * Update Encodings.md
    
    fix typo
    
    Co-authored-by: Gang Wu <[email protected]>
    
    * added note/link to the implementation status page
    
    ---------
    
    Co-authored-by: Andrew Lamb <[email protected]>
    Co-authored-by: Gang Wu <[email protected]>
---
 Encodings.md | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/Encodings.md b/Encodings.md
index e620e9a..1c766fb 100644
--- a/Encodings.md
+++ b/Encodings.md
@@ -25,6 +25,27 @@ This file contains the specification of all supported 
encodings.
 Unless otherwise stated in page or encoding documentation, any encoding can be
 used with any page type.
 
+### Supported Encodings
+
+For details on current implementation status, see the [Implementation 
Status](https://parquet.apache.org/docs/file-format/implementationstatus/#encodings)
 page.
+
+| Encoding type                                    | Encoding enum             
                                | Supported Types                               
    |
+| ------------------------------------------------ | 
--------------------------------------------------------- | 
------------------------------------------------- |
+| [Plain](#PLAIN)                                  | PLAIN = 0                 
                                | All Physical Types                            
    |
+| [Dictionary Encoding](#DICTIONARY)               | PLAIN_DICTIONARY = 2 
(Deprecated) <br> RLE_DICTIONARY = 8 | All Physical Types                       
         |
+| [Run Length Encoding / Bit-Packing Hybrid](#RLE) | RLE = 3                   
                                | BOOLEAN, Dictionary Indices                   
    |
+| [Delta Encoding](#DELTAENC)                      | DELTA_BINARY_PACKED = 5   
                                | INT32, INT64                                  
    |
+| [Delta-length byte array](#DELTALENGTH)          | DELTA_LENGTH_BYTE_ARRAY = 
6                               | BYTE_ARRAY                                    
    |
+| [Delta Strings](#DELTASTRING)                    | DELTA_BYTE_ARRAY = 7      
                                | BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY              
    |
+| [Byte Stream Split](#BYTESTREAMSPLIT)            | BYTE_STREAM_SPLIT = 9     
                                | INT32, INT64, FLOAT, DOUBLE, 
FIXED_LEN_BYTE_ARRAY |
+
+### Deprecated Encodings
+
+| Encoding type                         | Encoding enum  |
+| ------------------------------------- | -------------- |
+| [Bit-packed (Deprecated)](#BITPACKED) | BIT_PACKED = 4 |
+
+
 <a name="PLAIN"></a>
 ### Plain: (PLAIN = 0)
 
@@ -50,6 +71,7 @@ For native types, this outputs the data as little endian. 
Floating
 For the byte array type, it encodes the length as a 4 byte little
 endian, followed by the bytes.
 
+<a name="DICTIONARY"></a>
 ### Dictionary Encoding (PLAIN_DICTIONARY = 2 and RLE_DICTIONARY = 8)
 The dictionary encoding builds a dictionary of values encountered in a given 
column. The
 dictionary will be stored in a dictionary page per column chunk. The values 
are stored as integers
@@ -295,6 +317,7 @@ The encoded data is
 This encoding is similar to the [RLE/bit-packing](#RLE) encoding. However the 
[RLE/bit-packing](#RLE) encoding is specifically used when the range of ints is 
small over the entire page, as is true of repetition and definition levels. It 
uses a single bit width for the whole page.
 The delta encoding algorithm described above stores a bit width per miniblock 
and is less sensitive to variations in the size of encoded integers. It is also 
somewhat doing RLE encoding as a block containing all the same values will be 
bit packed to a zero bit width thus being only a header.
 
+<a name="DELTALENGTH"></a>
 ### Delta-length byte array: (DELTA_LENGTH_BYTE_ARRAY = 6)
 
 Supported Types: BYTE_ARRAY
@@ -317,6 +340,7 @@ then the encoded data would be comprised of the following 
segments:
 - DeltaEncoding(5, 5, 6, 6) (the string lengths)
 - "HelloWorldFoobarABCDEF"
 
+<a name="DELTASTRING"></a>
 ### Delta Strings: (DELTA_BYTE_ARRAY = 7)
 
 Supported Types: BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY
@@ -338,6 +362,7 @@ then the encoded data would be comprised of the following 
segments:
 
 Note that, even for FIXED_LEN_BYTE_ARRAY, all lengths are encoded despite the 
redundancy.
 
+<a name="BYTESTREAMSPLIT"></a>
 ### Byte Stream Split: (BYTE_STREAM_SPLIT = 9)
 
 Supported Types: FLOAT, DOUBLE, INT32, INT64, FIXED_LEN_BYTE_ARRAY

(parquet-format) branch master updated: MINOR: Add summary table of encodings and supported types (in Encodings.md) (#550) (#552)

Reply via email to