This is an automated email from the ASF dual-hosted git repository. gangwu pushed a commit to branch production in repository https://gitbox.apache.org/repos/asf/parquet-site.git
The following commit(s) were added to refs/heads/production by this push:
new e61eab5 Update matrix for parquet-cpp and parquet-java (#100)
e61eab5 is described below
commit e61eab5a5489e812b6215c326f9cc1749165fd68
Author: Gang Wu <[email protected]>
AuthorDate: Thu Feb 13 10:45:18 2025 +0800
Update matrix for parquet-cpp and parquet-java (#100)
---
.../en/docs/File Format/implementationstatus.md | 116 +++++++++++----------
1 file changed, 59 insertions(+), 57 deletions(-)
diff --git a/content/en/docs/File Format/implementationstatus.md
b/content/en/docs/File Format/implementationstatus.md
index 8b32876..3bd1d23 100644
--- a/content/en/docs/File Format/implementationstatus.md
+++ b/content/en/docs/File Format/implementationstatus.md
@@ -29,14 +29,14 @@ Implementations:
| Data type | C++ | Java | Go | Rust |
cuDF |
| ----------------------------------------- | ----- | ----- | ----- | ----- |
----- |
-| BOOLEAN | | | | |
✅ |
-| INT32 | | | | |
✅ |
-| INT64 | | | | |
✅ |
-| INT96 (1) | | | | |
✅ |
-| FLOAT | | | | |
✅ |
-| DOUBLE | | | | |
✅ |
-| BYTE_ARRAY | | | | |
✅ |
-| FIXED_LEN_BYTE_ARRAY | | | | |
✅ |
+| BOOLEAN | ✅ | ✅ | | | ✅
|
+| INT32 | ✅ | ✅ | | | ✅
|
+| INT64 | ✅ | ✅ | | | ✅
|
+| INT96 (1) | ✅ | ✅ | | | ✅
|
+| FLOAT | ✅ | ✅ | | | ✅
|
+| DOUBLE | ✅ | ✅ | | | ✅
|
+| BYTE_ARRAY | ✅ | ✅ | | | ✅
|
+| FIXED_LEN_BYTE_ARRAY | ✅ | ✅ | | | ✅
|
* \(1) This type is deprecated, but as of 2024 it's common in currently
produced parquet files
@@ -45,64 +45,66 @@ Implementations:
| Data type | C++ | Java | Go | Rust |
cuDF |
| ----------------------------------------- | ----- | ----- | ----- | ----- |
----- |
-| STRING | | | | |
✅ |
-| ENUM | | | | |
❌ |
-| UUID | | | | |
❌ |
-| 8, 16, 32, 64 bit signed and unsigned INT | | | | |
✅ |
-| DECIMAL (INT32) | | | | |
✅ |
-| DECIMAL (INT64) | | | | |
✅ |
-| DECIMAL (BYTE_ARRAY) | | | | |
✅ |
-| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | |
✅ |
-| DATE | | | | |
✅ |
-| TIME (INT32) | | | | |
✅ |
-| TIME (INT64) | | | | |
✅ |
-| TIMESTAMP (INT64) | | | | |
✅ |
-| INTERVAL | | | | |
❌ |
-| JSON | | | | |
❌ |
-| BSON | | | | |
❌ |
-| LIST | | | | |
✅ |
-| MAP | | | | |
✅ |
-| UNKNOWN (always null) | | | | |
✅ |
-| FLOAT16 | | | | |
✅ |
+| STRING | ✅ | ✅ | | | ✅
|
+| ENUM | ❌ | ✅ | | | ❌
|
+| UUID | ❌ | ✅ | | | ❌
|
+| 8, 16, 32, 64 bit signed and unsigned INT | ✅ | ✅ | | | ✅
|
+| DECIMAL (INT32) | ✅ | ✅ | | | ✅
|
+| DECIMAL (INT64) | ✅ | ✅ | | | ✅
|
+| DECIMAL (BYTE_ARRAY) | ✅ | ✅ | | | ✅
|
+| DECIMAL (FIXED_LEN_BYTE_ARRAY) | ✅ | ✅ | | | ✅
|
+| DATE | ✅ | ✅ | | | ✅
|
+| TIME (INT32) | ✅ | ✅ | | | ✅
|
+| TIME (INT64) | ✅ | ✅ | | | ✅
|
+| TIMESTAMP (INT64) | ✅ | ✅ | | | ✅
|
+| INTERVAL | ✅ | ✅(*)| | | ❌
|
+| JSON | ✅ | ✅(*)| | | ❌
|
+| BSON | ❌ | ✅(*)| | | ❌
|
+| LIST | ✅ | ✅ | | | ✅
|
+| MAP | ✅ | ✅ | | | ✅
|
+| UNKNOWN (always null) | ✅ | ✅ | | | ✅
|
+| FLOAT16 | ✅ | ✅(*)| | | ✅
|
+
+(*): Only supported to use its annotated physical type
### Encodings
| Encoding | C++ | Java | Go | Rust |
cuDF |
| ----------------------------------------- | ----- | ----- | ----- | ----- |
----- |
-| PLAIN | | | | |
✅ |
-| PLAIN_DICTIONARY | | | | |
✅ |
-| RLE_DICTIONARY | | | | |
✅ |
-| RLE | | | | |
✅ |
-| BIT_PACKED (deprecated) | | | | |
(R) |
-| DELTA_BINARY_PACKED | | | | |
✅ |
-| DELTA_LENGTH_BYTE_ARRAY | | | | |
✅ |
-| DELTA_BYTE_ARRAY | | | | |
✅ |
-| BYTE_STREAM_SPLIT | | | | |
✅ |
+| PLAIN | ✅ | ✅ | | | ✅
|
+| PLAIN_DICTIONARY | ✅ | ✅ | | | ✅
|
+| RLE_DICTIONARY | ✅ | ✅ | | | ✅
|
+| RLE | ✅ | ✅ | | | ✅
|
+| BIT_PACKED (deprecated) | ✅ | ✅ | | |
(R) |
+| DELTA_BINARY_PACKED | ✅ | ✅ | | | ✅
|
+| DELTA_LENGTH_BYTE_ARRAY | ✅ | ✅ | | | ✅
|
+| DELTA_BYTE_ARRAY | ✅ | ✅ | | | ✅
|
+| BYTE_STREAM_SPLIT | ✅ | ✅ | | | ✅
|
### Compressions
| Compression | C++ | Java | Go | Rust |
cuDF |
| ----------------------------------------- | ----- | ----- | ----- | ----- |
----- |
-| UNCOMPRESSED | | | | |
✅ |
-| BROTLI | | | | |
(R) |
-| GZIP | | | | |
(R) |
-| LZ4 (deprecated) | | | | |
❌ |
-| LZ4_RAW | | | | |
✅ |
-| LZO | | | | |
❌ |
-| SNAPPY | | | | |
✅ |
-| ZSTD | | | | |
✅ |
+| UNCOMPRESSED | ✅ | ✅ | | | ✅
|
+| BROTLI | ✅ | ✅ | | |
(R) |
+| GZIP | ✅ | ✅ | | |
(R) |
+| LZ4 (deprecated) | ✅ | ❌ | | | ❌
|
+| LZ4_RAW | ✅ | ✅ | | | ✅
|
+| LZO | ❌ | ❌ | | | ❌
|
+| SNAPPY | ✅ | ✅ | | | ✅
|
+| ZSTD | ✅ | ✅ | | | ✅
|
### Other format level features
| | C++ | Java | Go | Rust |
cuDF |
| ----------------------------------------- | ----- | ----- | ----- | ----- |
----- |
-| xxHash-based bloom filters | | | | |
(R) |
-| Bloom filter length (1) | | | | |
(R) |
-| Statistics min_value, max_value | | | | |
✅ |
-| Page index | | | | |
✅ |
-| Page CRC32 checksum | | | | |
❌ |
-| Modular encryption | | | | |
❌ |
-| Size statistics (2) | | | | |
✅ |
+| xxHash-based bloom filters | (R) | ✅ | | |
(R) |
+| Bloom filter length (1) | (R) | ✅ | | |
(R) |
+| Statistics min_value, max_value | ✅ | ✅ | | | ✅
|
+| Page index | ✅ | ✅ | | | ✅
|
+| Page CRC32 checksum | ✅ | ✅ | | | ❌
|
+| Modular encryption | ✅ | ✅ | | | ❌
|
+| Size statistics (2) | ✅ | ✅ | | | ✅
|
* \(1) In parquet.thrift: ColumnMetaData->bloom_filter_length
@@ -113,12 +115,12 @@ Implementations:
| Format | C++ | Java | Go | Rust
| cuDF |
| -------------------------------------------- | ----- | ----- | ----- | -----
| ----- |
-| External column data (1) | | | |
| (W) |
-| Row group "Sorting column" metadata (2) | | | |
| (W) |
-| Row group pruning using statistics | | | |
| ✅ |
-| Row group pruning using bloom filter | | | |
| ✅ |
-| Reading select columns only | | | |
| ✅ |
-| Page pruning using statistics | | | |
| ❌ |
+| External column data (1) | ✅ | ✅ | | |
(W) |
+| Row group "Sorting column" metadata (2) | ✅ | ❌ | | |
(W) |
+| Row group pruning using statistics | ❌ | ✅ | | |
✅ |
+| Row group pruning using bloom filter | ❌ | ✅ | | |
✅ |
+| Reading select columns only | ✅ | ✅ | | |
✅ |
+| Page pruning using statistics | ❌ | ✅ | | |
❌ |
* \(1) In parquet.thrift: ColumnChunk->file_path
