etseidl commented on code in PR #99: URL: https://github.com/apache/parquet-site/pull/99#discussion_r1939566940
########## content/en/docs/File Format/implementationstatus.md: ########## @@ -13,94 +13,96 @@ implementations. The value in each box means: * ✅: supported * ❌: not supported +* (R/W): partial reader/writer only support * (blank) no data Implementations: * `C++`: [parquet-cpp](https://github.com/apache/arrow/tree/main/cpp/src/parquet) * `Java`: [parquet-java](https://github.com/apache/parquet-java) * `Go`: [parquet-go](https://github.com/apache/arrow-go/tree/main/parquet) * `Rust`: [parquet-rs](https://github.com/apache/arrow-rs/blob/main/parquet/README.md) +* `CUDA C++`: [cudf](https://github.com/rapidsai/cudf) ### Physical types -| Data type | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| BOOLEAN | | | | | -| INT32 | | | | | -| INT64 | | | | | -| INT96 (1) | | | | | -| FLOAT | | | | | -| DOUBLE | | | | | -| BYTE_ARRAY | | | | | -| FIXED_LEN_BYTE_ARRAY | | | | | +| Data type | C++ | Java | Go | Rust | CUDA C++ | +| ----------------------------------------- | ----- | ------ | ----- | ----- | -------- | +| BOOLEAN | | | | | ✅ | +| INT32 | | | | | ✅ | +| INT64 | | | | | ✅ | +| INT96 (1) | | | | | ✅ | +| FLOAT | | | | | ✅ | +| DOUBLE | | | | | ✅ | +| BYTE_ARRAY | | | | | ✅ | +| FIXED_LEN_BYTE_ARRAY | | | | | ✅ | * \(1) This type is deprecated, but as of 2024 it's common in currently produced parquet files ### Logical types -| Data type | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| STRING | | | | | -| ENUM | | | | | -| UUID | | | | | -| 8, 16, 32, 64 bit signed and unsigned INT | | | | | -| DECIMAL (INT32) | | | | | -| DECIMAL (INT64) | | | | | -| DECIMAL (BYTE_ARRAY) | | | | | -| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | -| DATE | | | | | -| TIME (INT32) | | | | | -| TIME (INT64) | | | | | -| TIMESTAMP (INT64) | | | | | -| INTERVAL | | | | | -| JSON | | | | | -| BSON | | | | | -| LIST | | | | | -| MAP | | | | | -| UNKNOWN (always null) | | | | | -| FLOAT16 | | | | | +| Data type | C++ | Java | Go | Rust | CUDA C++ | +| ----------------------------------------- | ----- | ------ | ----- | ----- | -------- | +| STRING | | | | | ✅ | +| ENUM | | | | | ❌ | +| UUID | | | | | ❌ | +| 8, 16, 32, 64 bit signed and unsigned INT | | | | | ✅ | +| DECIMAL (INT32) | | | | | ✅ | +| DECIMAL (INT64) | | | | | ✅ | +| DECIMAL (BYTE_ARRAY) | | | | | ✅ | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | ✅ | +| DATE | | | | | ✅ | +| TIME (INT32) | | | | | ✅ | +| TIME (INT64) | | | | | ✅ | +| TIMESTAMP (INT64) | | | | | ✅ | +| INTERVAL | | | | | ❌ | +| JSON | | | | | ❌ | +| BSON | | | | | ❌ | +| LIST | | | | | ✅ | +| MAP | | | | | ✅ | +| UNKNOWN (always null) | | | | | ✅ | +| FLOAT16 | | | | | ✅ | ### Encodings -| Encoding | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| PLAIN | | | | | -| PLAIN_DICTIONARY | | | | | -| RLE_DICTIONARY | | | | | -| RLE | | | | | -| BIT_PACKED (deprecated) | | | | | -| DELTA_BINARY_PACKED | | | | | -| DELTA_LENGTH_BYTE_ARRAY | | | | | -| DELTA_BYTE_ARRAY | | | | | -| BYTE_STREAM_SPLIT | | | | | +| Encoding | C++ | Java | Go | Rust | CUDA C++ | +| ----------------------------------------- | ----- | ------ | ----- | ----- | -------- | +| PLAIN | | | | | ✅ | +| PLAIN_DICTIONARY | | | | | ✅ | +| RLE_DICTIONARY | | | | | ✅ | +| RLE | | | | | ✅ | +| BIT_PACKED (deprecated) | | | | | (R) | +| DELTA_BINARY_PACKED | | | | | ✅ | +| DELTA_LENGTH_BYTE_ARRAY | | | | | ✅ | +| DELTA_BYTE_ARRAY | | | | | ✅ | +| BYTE_STREAM_SPLIT | | | | | ✅ | ### Compressions -| Compression | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| UNCOMPRESSED | | | | | -| BROTLI | | | | | -| GZIP | | | | | -| LZ4 (deprecated) | | | | | -| LZ4_RAW | | | | | -| LZO | | | | | -| SNAPPY | | | | | -| ZSTD | | | | | +| Compression | C++ | Java | Go | Rust | CUDA C++ | +| ----------------------------------------- | ----- | ------ | ----- | ----- | -------- | +| UNCOMPRESSED | | | | | ✅ | +| BROTLI | | | | | (R) | +| GZIP | | | | | (R) | +| LZ4 (deprecated) | | | | | ❌ | +| LZ4_RAW | | | | | ✅ | +| LZO | | | | | ❌ | +| SNAPPY | | | | | ✅ | +| ZSTD | | | | | ✅ | ### Other format level features -| | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| xxxHash-based bloom filters | | | | | -| Bloom filter length (1) | | | | | -| Statistics min_value, max_value | | | | | -| Page index | | | | | -| Page CRC32 checksum | | | | | -| Modular encryption | | | | | -| Size statistics (2) | | | | | +| | C++ | Java | Go | Rust | CUDA C++ | +| ----------------------------------------- | ----- | ------ | ----- | ----- | -------- | +| xxHash-based bloom filters | | | | | (R) | +| Bloom filter length (1) | | | | | (R) | +| Statistics min_value, max_value | | | | | ✅ | +| Page index | | | | | ❌ | +| Page CRC32 checksum | | | | | ❌ | +| Modular encryption | | | | | ❌ | +| Size statistics (2) | | | | | ❌ | Review Comment: This should be a check as well ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -13,94 +13,96 @@ implementations. The value in each box means: * ✅: supported * ❌: not supported +* (R/W): partial reader/writer only support * (blank) no data Implementations: * `C++`: [parquet-cpp](https://github.com/apache/arrow/tree/main/cpp/src/parquet) * `Java`: [parquet-java](https://github.com/apache/parquet-java) * `Go`: [parquet-go](https://github.com/apache/arrow-go/tree/main/parquet) * `Rust`: [parquet-rs](https://github.com/apache/arrow-rs/blob/main/parquet/README.md) +* `CUDA C++`: [cudf](https://github.com/rapidsai/cudf) ### Physical types -| Data type | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| BOOLEAN | | | | | -| INT32 | | | | | -| INT64 | | | | | -| INT96 (1) | | | | | -| FLOAT | | | | | -| DOUBLE | | | | | -| BYTE_ARRAY | | | | | -| FIXED_LEN_BYTE_ARRAY | | | | | +| Data type | C++ | Java | Go | Rust | CUDA C++ | +| ----------------------------------------- | ----- | ------ | ----- | ----- | -------- | +| BOOLEAN | | | | | ✅ | +| INT32 | | | | | ✅ | +| INT64 | | | | | ✅ | +| INT96 (1) | | | | | ✅ | +| FLOAT | | | | | ✅ | +| DOUBLE | | | | | ✅ | +| BYTE_ARRAY | | | | | ✅ | +| FIXED_LEN_BYTE_ARRAY | | | | | ✅ | * \(1) This type is deprecated, but as of 2024 it's common in currently produced parquet files ### Logical types -| Data type | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| STRING | | | | | -| ENUM | | | | | -| UUID | | | | | -| 8, 16, 32, 64 bit signed and unsigned INT | | | | | -| DECIMAL (INT32) | | | | | -| DECIMAL (INT64) | | | | | -| DECIMAL (BYTE_ARRAY) | | | | | -| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | -| DATE | | | | | -| TIME (INT32) | | | | | -| TIME (INT64) | | | | | -| TIMESTAMP (INT64) | | | | | -| INTERVAL | | | | | -| JSON | | | | | -| BSON | | | | | -| LIST | | | | | -| MAP | | | | | -| UNKNOWN (always null) | | | | | -| FLOAT16 | | | | | +| Data type | C++ | Java | Go | Rust | CUDA C++ | +| ----------------------------------------- | ----- | ------ | ----- | ----- | -------- | +| STRING | | | | | ✅ | +| ENUM | | | | | ❌ | +| UUID | | | | | ❌ | +| 8, 16, 32, 64 bit signed and unsigned INT | | | | | ✅ | +| DECIMAL (INT32) | | | | | ✅ | +| DECIMAL (INT64) | | | | | ✅ | +| DECIMAL (BYTE_ARRAY) | | | | | ✅ | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | ✅ | +| DATE | | | | | ✅ | +| TIME (INT32) | | | | | ✅ | +| TIME (INT64) | | | | | ✅ | +| TIMESTAMP (INT64) | | | | | ✅ | +| INTERVAL | | | | | ❌ | +| JSON | | | | | ❌ | +| BSON | | | | | ❌ | +| LIST | | | | | ✅ | +| MAP | | | | | ✅ | +| UNKNOWN (always null) | | | | | ✅ | +| FLOAT16 | | | | | ✅ | ### Encodings -| Encoding | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| PLAIN | | | | | -| PLAIN_DICTIONARY | | | | | -| RLE_DICTIONARY | | | | | -| RLE | | | | | -| BIT_PACKED (deprecated) | | | | | -| DELTA_BINARY_PACKED | | | | | -| DELTA_LENGTH_BYTE_ARRAY | | | | | -| DELTA_BYTE_ARRAY | | | | | -| BYTE_STREAM_SPLIT | | | | | +| Encoding | C++ | Java | Go | Rust | CUDA C++ | +| ----------------------------------------- | ----- | ------ | ----- | ----- | -------- | +| PLAIN | | | | | ✅ | +| PLAIN_DICTIONARY | | | | | ✅ | +| RLE_DICTIONARY | | | | | ✅ | +| RLE | | | | | ✅ | +| BIT_PACKED (deprecated) | | | | | (R) | +| DELTA_BINARY_PACKED | | | | | ✅ | +| DELTA_LENGTH_BYTE_ARRAY | | | | | ✅ | +| DELTA_BYTE_ARRAY | | | | | ✅ | +| BYTE_STREAM_SPLIT | | | | | ✅ | ### Compressions -| Compression | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| UNCOMPRESSED | | | | | -| BROTLI | | | | | -| GZIP | | | | | -| LZ4 (deprecated) | | | | | -| LZ4_RAW | | | | | -| LZO | | | | | -| SNAPPY | | | | | -| ZSTD | | | | | +| Compression | C++ | Java | Go | Rust | CUDA C++ | +| ----------------------------------------- | ----- | ------ | ----- | ----- | -------- | +| UNCOMPRESSED | | | | | ✅ | +| BROTLI | | | | | (R) | +| GZIP | | | | | (R) | +| LZ4 (deprecated) | | | | | ❌ | +| LZ4_RAW | | | | | ✅ | +| LZO | | | | | ❌ | +| SNAPPY | | | | | ✅ | +| ZSTD | | | | | ✅ | ### Other format level features -| | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| xxxHash-based bloom filters | | | | | -| Bloom filter length (1) | | | | | -| Statistics min_value, max_value | | | | | -| Page index | | | | | -| Page CRC32 checksum | | | | | -| Modular encryption | | | | | -| Size statistics (2) | | | | | +| | C++ | Java | Go | Rust | CUDA C++ | +| ----------------------------------------- | ----- | ------ | ----- | ----- | -------- | +| xxHash-based bloom filters | | | | | (R) | +| Bloom filter length (1) | | | | | (R) | +| Statistics min_value, max_value | | | | | ✅ | +| Page index | | | | | ❌ | Review Comment: cuDF can read and write page indexes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
