This is an automated email from the ASF dual-hosted git repository.
gangwu pushed a commit to branch production
in repository https://gitbox.apache.org/repos/asf/parquet-site.git
The following commit(s) were added to refs/heads/production by this push:
new 4557062 Add implementation status for cuDF (#99)
4557062 is described below
commit 4557062b324902b0543855cda7746621abe3fcd5
Author: Muhammad Haseeb <[email protected]>
AuthorDate: Wed Feb 5 02:01:33 2025 -0800
Add implementation status for cuDF (#99)
Co-authored-by: Bradley Dice <[email protected]>
---
.../en/docs/File Format/implementationstatus.md | 140 +++++++++++----------
1 file changed, 71 insertions(+), 69 deletions(-)
diff --git a/content/en/docs/File Format/implementationstatus.md
b/content/en/docs/File Format/implementationstatus.md
index 709cbce..8b32876 100644
--- a/content/en/docs/File Format/implementationstatus.md
+++ b/content/en/docs/File Format/implementationstatus.md
@@ -13,6 +13,7 @@ implementations.
The value in each box means:
* ✅: supported
* ❌: not supported
+* (R/W): partial reader/writer only support
* (blank) no data
Implementations:
@@ -20,87 +21,88 @@ Implementations:
* `Java`: [parquet-java](https://github.com/apache/parquet-java)
* `Go`: [parquet-go](https://github.com/apache/arrow-go/tree/main/parquet)
* `Rust`:
[parquet-rs](https://github.com/apache/arrow-rs/blob/main/parquet/README.md)
+* `cuDF`: [cudf](https://github.com/rapidsai/cudf)
### Physical types
-| Data type | C++ | Java | Go | Rust |
-| ----------------------------------------- | ----- | ------ | ----- | ----- |
-| BOOLEAN | | | | |
-| INT32 | | | | |
-| INT64 | | | | |
-| INT96 (1) | | | | |
-| FLOAT | | | | |
-| DOUBLE | | | | |
-| BYTE_ARRAY | | | | |
-| FIXED_LEN_BYTE_ARRAY | | | | |
+| Data type | C++ | Java | Go | Rust |
cuDF |
+| ----------------------------------------- | ----- | ----- | ----- | ----- |
----- |
+| BOOLEAN | | | | |
✅ |
+| INT32 | | | | |
✅ |
+| INT64 | | | | |
✅ |
+| INT96 (1) | | | | |
✅ |
+| FLOAT | | | | |
✅ |
+| DOUBLE | | | | |
✅ |
+| BYTE_ARRAY | | | | |
✅ |
+| FIXED_LEN_BYTE_ARRAY | | | | |
✅ |
* \(1) This type is deprecated, but as of 2024 it's common in currently
produced parquet files
### Logical types
-| Data type | C++ | Java | Go | Rust |
-| ----------------------------------------- | ----- | ------ | ----- | ----- |
-| STRING | | | | |
-| ENUM | | | | |
-| UUID | | | | |
-| 8, 16, 32, 64 bit signed and unsigned INT | | | | |
-| DECIMAL (INT32) | | | | |
-| DECIMAL (INT64) | | | | |
-| DECIMAL (BYTE_ARRAY) | | | | |
-| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | |
-| DATE | | | | |
-| TIME (INT32) | | | | |
-| TIME (INT64) | | | | |
-| TIMESTAMP (INT64) | | | | |
-| INTERVAL | | | | |
-| JSON | | | | |
-| BSON | | | | |
-| LIST | | | | |
-| MAP | | | | |
-| UNKNOWN (always null) | | | | |
-| FLOAT16 | | | | |
+| Data type | C++ | Java | Go | Rust |
cuDF |
+| ----------------------------------------- | ----- | ----- | ----- | ----- |
----- |
+| STRING | | | | |
✅ |
+| ENUM | | | | |
❌ |
+| UUID | | | | |
❌ |
+| 8, 16, 32, 64 bit signed and unsigned INT | | | | |
✅ |
+| DECIMAL (INT32) | | | | |
✅ |
+| DECIMAL (INT64) | | | | |
✅ |
+| DECIMAL (BYTE_ARRAY) | | | | |
✅ |
+| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | |
✅ |
+| DATE | | | | |
✅ |
+| TIME (INT32) | | | | |
✅ |
+| TIME (INT64) | | | | |
✅ |
+| TIMESTAMP (INT64) | | | | |
✅ |
+| INTERVAL | | | | |
❌ |
+| JSON | | | | |
❌ |
+| BSON | | | | |
❌ |
+| LIST | | | | |
✅ |
+| MAP | | | | |
✅ |
+| UNKNOWN (always null) | | | | |
✅ |
+| FLOAT16 | | | | |
✅ |
### Encodings
-| Encoding | C++ | Java | Go | Rust |
-| ----------------------------------------- | ----- | ------ | ----- | ----- |
-| PLAIN | | | | |
-| PLAIN_DICTIONARY | | | | |
-| RLE_DICTIONARY | | | | |
-| RLE | | | | |
-| BIT_PACKED (deprecated) | | | | |
-| DELTA_BINARY_PACKED | | | | |
-| DELTA_LENGTH_BYTE_ARRAY | | | | |
-| DELTA_BYTE_ARRAY | | | | |
-| BYTE_STREAM_SPLIT | | | | |
+| Encoding | C++ | Java | Go | Rust |
cuDF |
+| ----------------------------------------- | ----- | ----- | ----- | ----- |
----- |
+| PLAIN | | | | |
✅ |
+| PLAIN_DICTIONARY | | | | |
✅ |
+| RLE_DICTIONARY | | | | |
✅ |
+| RLE | | | | |
✅ |
+| BIT_PACKED (deprecated) | | | | |
(R) |
+| DELTA_BINARY_PACKED | | | | |
✅ |
+| DELTA_LENGTH_BYTE_ARRAY | | | | |
✅ |
+| DELTA_BYTE_ARRAY | | | | |
✅ |
+| BYTE_STREAM_SPLIT | | | | |
✅ |
### Compressions
-| Compression | C++ | Java | Go | Rust |
-| ----------------------------------------- | ----- | ------ | ----- | ----- |
-| UNCOMPRESSED | | | | |
-| BROTLI | | | | |
-| GZIP | | | | |
-| LZ4 (deprecated) | | | | |
-| LZ4_RAW | | | | |
-| LZO | | | | |
-| SNAPPY | | | | |
-| ZSTD | | | | |
+| Compression | C++ | Java | Go | Rust |
cuDF |
+| ----------------------------------------- | ----- | ----- | ----- | ----- |
----- |
+| UNCOMPRESSED | | | | |
✅ |
+| BROTLI | | | | |
(R) |
+| GZIP | | | | |
(R) |
+| LZ4 (deprecated) | | | | |
❌ |
+| LZ4_RAW | | | | |
✅ |
+| LZO | | | | |
❌ |
+| SNAPPY | | | | |
✅ |
+| ZSTD | | | | |
✅ |
### Other format level features
-| | C++ | Java | Go | Rust |
-| ----------------------------------------- | ----- | ------ | ----- | ----- |
-| xxxHash-based bloom filters | | | | |
-| Bloom filter length (1) | | | | |
-| Statistics min_value, max_value | | | | |
-| Page index | | | | |
-| Page CRC32 checksum | | | | |
-| Modular encryption | | | | |
-| Size statistics (2) | | | | |
+| | C++ | Java | Go | Rust |
cuDF |
+| ----------------------------------------- | ----- | ----- | ----- | ----- |
----- |
+| xxHash-based bloom filters | | | | |
(R) |
+| Bloom filter length (1) | | | | |
(R) |
+| Statistics min_value, max_value | | | | |
✅ |
+| Page index | | | | |
✅ |
+| Page CRC32 checksum | | | | |
❌ |
+| Modular encryption | | | | |
❌ |
+| Size statistics (2) | | | | |
✅ |
* \(1) In parquet.thrift: ColumnMetaData->bloom_filter_length
@@ -109,14 +111,14 @@ Implementations:
### High level data APIs for Parquet feature usage
-| Format | C++ | Java | Go | Rust
|
-| -------------------------------------------- | ----- | ------ | ----- |
----- |
-| External column data (1) | | | |
|
-| Row group "Sorting column" metadata (2) | | | |
|
-| Row group pruning using statistics | | | |
|
-| Reading select columns only | | | |
|
-| Page pruning using statistics | | | |
|
-| Page pruning using bloom filter | | | |
|
+| Format | C++ | Java | Go | Rust
| cuDF |
+| -------------------------------------------- | ----- | ----- | ----- | -----
| ----- |
+| External column data (1) | | | |
| (W) |
+| Row group "Sorting column" metadata (2) | | | |
| (W) |
+| Row group pruning using statistics | | | |
| ✅ |
+| Row group pruning using bloom filter | | | |
| ✅ |
+| Reading select columns only | | | |
| ✅ |
+| Page pruning using statistics | | | |
| ❌ |
* \(1) In parquet.thrift: ColumnChunk->file_path