This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a commit to branch production
in repository https://gitbox.apache.org/repos/asf/parquet-site.git


The following commit(s) were added to refs/heads/production by this push:
     new 19eb00f  PARQUET-2310: implementation status (#34)
19eb00f is described below

commit 19eb00ff7251e877cc4e3a69fd9496a5002f0b25
Author: Ádám Lippai <[email protected]>
AuthorDate: Thu Jul 4 13:02:14 2024 -0400

    PARQUET-2310: implementation status (#34)
    
    Add outline of implementation status tables.
    
    Co-authored-by: Andrew Lamb <[email protected]>
---
 .../en/docs/File Format/implementationstatus.md    | 124 +++++++++++++++++++++
 1 file changed, 124 insertions(+)

diff --git a/content/en/docs/File Format/implementationstatus.md 
b/content/en/docs/File Format/implementationstatus.md
new file mode 100644
index 0000000..6453373
--- /dev/null
+++ b/content/en/docs/File Format/implementationstatus.md       
@@ -0,0 +1,124 @@
+---
+title: "Implementation status"
+linkTitle: "Implementation status"
+weight: 8
+---
+
+This page summarizes the features supported by different Parquet
+implementations.
+
+*Note*: This is a work in progress and we would welcome help expanding its 
scope.
+
+### Legend
+The value in each box means:
+* ✅: supported
+* ❌: not supported
+* (blank) no data
+
+Implementations:
+* `C++`: 
[parquet-cpp](https://github.com/apache/arrow/tree/main/cpp/src/parquet)
+* `Java`: [parquet-java](https://github.com/apache/parquet-java)
+* `Go`: [parquet-go](https://github.com/apache/arrow/tree/main/go/parquet)
+* `Rust`: 
[parquet-rs](https://github.com/apache/arrow-rs/blob/master/parquet/README.md)
+
+
+
+### Physical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| BOOLEAN                                   |       |        |       |       |
+| INT32                                     |       |        |       |       |
+| INT64                                     |       |        |       |       |
+| INT96 (1)                                 |       |        |       |       |
+| FLOAT                                     |       |        |       |       |
+| DOUBLE                                    |       |        |       |       |
+| BYTE_ARRAY                                |       |        |       |       |
+| FIXED_LEN_BYTE_ARRAY                      |       |        |       |       |
+
+* \(1) This type is deprecated, but as of 2024 it's common in currently 
produced parquet files
+
+
+### Logical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| STRING                                    |       |        |       |       |
+| ENUM                                      |       |        |       |       |
+| UUID                                      |       |        |       |       |
+| 8, 16, 32, 64 bit signed and unsigned INT |       |        |       |       |
+| DECIMAL (INT32)                           |       |        |       |       |
+| DECIMAL (INT64)                           |       |        |       |       |
+| DECIMAL (BYTE_ARRAY)                      |       |        |       |       |
+| DECIMAL (FIXED_LEN_BYTE_ARRAY)            |       |        |       |       |
+| DATE                                      |       |        |       |       |
+| TIME (INT32)                              |       |        |       |       |
+| TIME (INT64)                              |       |        |       |       |
+| TIMESTAMP (INT64)                         |       |        |       |       |
+| INTERVAL                                  |       |        |       |       |
+| JSON                                      |       |        |       |       |
+| BSON                                      |       |        |       |       |
+| LIST                                      |       |        |       |       |
+| MAP                                       |       |        |       |       |
+| UNKNOWN (always null)                     |       |        |       |       |
+| FLOAT16                                   |       |        |       |       |
+
+### Encodings
+
+| Encoding                                  | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| PLAIN                                     |       |        |       |       |
+| PLAIN_DICTIONARY                          |       |        |       |       |
+| RLE_DICTIONARY                            |       |        |       |       |
+| RLE                                       |       |        |       |       |
+| BIT_PACKED (deprecated)                   |       |        |       |       |
+| DELTA_BINARY_PACKED                       |       |        |       |       |
+| DELTA_LENGTH_BYTE_ARRAY                   |       |        |       |       |
+| DELTA_BYTE_ARRAY                          |       |        |       |       |
+| BYTE_STREAM_SPLIT                         |       |        |       |       |
+
+### Compressions
+
+| Compression                               | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| UNCOMPRESSED                              |       |        |       |       |
+| BROTLI                                    |       |        |       |       |
+| GZIP                                      |       |        |       |       |
+| LZ4 (deprecated)                          |       |        |       |       |
+| LZ4_RAW                                   |       |        |       |       |
+| LZO                                       |       |        |       |       |
+| SNAPPY                                    |       |        |       |       |
+| ZSTD                                      |       |        |       |       |
+
+### Other format level features
+
+|                                           | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| xxxHash-based bloom filters               |       |        |       |       |
+| Bloom filter length (1)                   |       |        |       |       |
+| Statistics min_value, max_value           |       |        |       |       |
+| Page index                                |       |        |       |       |
+| Page CRC32 checksum                       |       |        |       |       |
+| Modular encryption                        |       |        |       |       |
+| Size statistics (2)                       |       |        |       |       |
+
+
+* \(1) In parquet.thrift: ColumnMetaData->bloom_filter_length
+
+* \(2) In parquet.thrift: ColumnMetaData->size_statistics
+
+### High level data APIs for Parquet feature usage
+
+| Format                                       | C++   | Java   | Go    | Rust 
 |
+| -------------------------------------------- | ----- | ------ | ----- | 
----- |
+| External column data (1)                     |       |        |       |      
 |
+| Row group "Sorting column" metadata (2)      |       |        |       |      
 |
+| Row group pruning using statistics           |       |        |       |      
 |
+| Reading select columns only                  |       |        |       |      
 |
+| Page pruning using statistics                |       |        |       |      
 |
+| Page pruning using bloom filter              |       |        |       |      
 |
+
+
+* \(1) In parquet.thrift: ColumnChunk->file_path
+
+* \(2) In parquet.thrift: RowGroup->sorting_columns

Reply via email to