bdice commented on code in PR #99: URL: https://github.com/apache/parquet-site/pull/99#discussion_r1933124319
########## content/en/docs/File Format/implementationstatus.md: ########## @@ -13,94 +13,96 @@ implementations. The value in each box means: * ✅: supported * ❌: not supported +* (R/W): partial reader/writer support * (blank) no data Implementations: * `C++`: [parquet-cpp](https://github.com/apache/arrow/tree/main/cpp/src/parquet) * `Java`: [parquet-java](https://github.com/apache/parquet-java) * `Go`: [parquet-go](https://github.com/apache/arrow-go/tree/main/parquet) * `Rust`: [parquet-rs](https://github.com/apache/arrow-rs/blob/main/parquet/README.md) +* `CUDA`:[cudf](https://github.com/rapidsai/cudf) ### Physical types -| Data type | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| BOOLEAN | | | | | -| INT32 | | | | | -| INT64 | | | | | -| INT96 (1) | | | | | -| FLOAT | | | | | -| DOUBLE | | | | | -| BYTE_ARRAY | | | | | -| FIXED_LEN_BYTE_ARRAY | | | | | +| Data type | C++ | Java | Go | Rust | CUDA | +| ----------------------------------------- | ----- | ------ | ----- | ----- | ----- | +| BOOLEAN | | | | | ✅ | +| INT32 | | | | | ✅ | +| INT64 | | | | | ✅ | +| INT96 (1) | | | | | ✅ | +| FLOAT | | | | | ✅ | +| DOUBLE | | | | | ✅ | +| BYTE_ARRAY | | | | | ✅ | +| FIXED_LEN_BYTE_ARRAY | | | | | ✅ | * \(1) This type is deprecated, but as of 2024 it's common in currently produced parquet files ### Logical types -| Data type | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| STRING | | | | | -| ENUM | | | | | -| UUID | | | | | -| 8, 16, 32, 64 bit signed and unsigned INT | | | | | -| DECIMAL (INT32) | | | | | -| DECIMAL (INT64) | | | | | -| DECIMAL (BYTE_ARRAY) | | | | | -| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | -| DATE | | | | | -| TIME (INT32) | | | | | -| TIME (INT64) | | | | | -| TIMESTAMP (INT64) | | | | | -| INTERVAL | | | | | -| JSON | | | | | -| BSON | | | | | -| LIST | | | | | -| MAP | | | | | -| UNKNOWN (always null) | | | | | -| FLOAT16 | | | | | +| Data type | C++ | Java | Go | Rust | CUDA | +| ----------------------------------------- | ----- | ------ | ----- | ----- | ----- | +| STRING | | | | | ✅ | +| ENUM | | | | | ❌ | +| UUID | | | | | ❌ | +| 8, 16, 32, 64 bit signed and unsigned INT | | | | | ✅ | +| DECIMAL (INT32) | | | | | ✅ | +| DECIMAL (INT64) | | | | | ✅ | +| DECIMAL (BYTE_ARRAY) | | | | | ✅ | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | ✅ | +| DATE | | | | | ✅ | +| TIME (INT32) | | | | | ✅ | +| TIME (INT64) | | | | | ✅ | +| TIMESTAMP (INT64) | | | | | ✅ | +| INTERVAL | | | | | ❌ | +| JSON | | | | | ❌ | +| BSON | | | | | ❌ | +| LIST | | | | | ✅ | +| MAP | | | | | ✅ | +| UNKNOWN (always null) | | | | | ✅ | +| FLOAT16 | | | | | ✅ | ### Encodings -| Encoding | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| PLAIN | | | | | -| PLAIN_DICTIONARY | | | | | -| RLE_DICTIONARY | | | | | -| RLE | | | | | -| BIT_PACKED (deprecated) | | | | | -| DELTA_BINARY_PACKED | | | | | -| DELTA_LENGTH_BYTE_ARRAY | | | | | -| DELTA_BYTE_ARRAY | | | | | -| BYTE_STREAM_SPLIT | | | | | +| Encoding | C++ | Java | Go | Rust | CUDA | +| ----------------------------------------- | ----- | ------ | ----- | ----- | ----- | +| PLAIN | | | | | ✅ | +| PLAIN_DICTIONARY | | | | | ✅ | +| RLE_DICTIONARY | | | | | ✅ | +| RLE | | | | | ❌ | +| BIT_PACKED (deprecated) | | | | | ❌ | +| DELTA_BINARY_PACKED | | | | | ✅ | +| DELTA_LENGTH_BYTE_ARRAY | | | | | ✅ | +| DELTA_BYTE_ARRAY | | | | | ✅ | +| BYTE_STREAM_SPLIT | | | | | ✅ | ### Compressions -| Compression | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| UNCOMPRESSED | | | | | -| BROTLI | | | | | -| GZIP | | | | | -| LZ4 (deprecated) | | | | | -| LZ4_RAW | | | | | -| LZO | | | | | -| SNAPPY | | | | | -| ZSTD | | | | | +| Compression | C++ | Java | Go | Rust | CUDA | +| ----------------------------------------- | ----- | ------ | ----- | ----- | ----- | +| UNCOMPRESSED | | | | | ✅ | +| BROTLI | | | | | ❌ | +| GZIP | | | | | ❌ | +| LZ4 (deprecated) | | | | | ✅ | +| LZ4_RAW | | | | | ❌ | +| LZO | | | | | ❌ | +| SNAPPY | | | | | ✅ | +| ZSTD | | | | | ✅ | ### Other format level features -| | C++ | Java | Go | Rust | -| ----------------------------------------- | ----- | ------ | ----- | ----- | -| xxxHash-based bloom filters | | | | | -| Bloom filter length (1) | | | | | -| Statistics min_value, max_value | | | | | -| Page index | | | | | -| Page CRC32 checksum | | | | | -| Modular encryption | | | | | -| Size statistics (2) | | | | | +| | C++ | Java | Go | Rust | CUDA | +| ----------------------------------------- | ----- | ------ | ----- | ----- | ----- | +| xxxHash-based bloom filters | | | | | ✅ | Review Comment: ```suggestion | xxHash-based bloom filters | | | | | ✅ | ``` ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -13,94 +13,96 @@ implementations. The value in each box means: * ✅: supported * ❌: not supported +* (R/W): partial reader/writer support * (blank) no data Implementations: * `C++`: [parquet-cpp](https://github.com/apache/arrow/tree/main/cpp/src/parquet) * `Java`: [parquet-java](https://github.com/apache/parquet-java) * `Go`: [parquet-go](https://github.com/apache/arrow-go/tree/main/parquet) * `Rust`: [parquet-rs](https://github.com/apache/arrow-rs/blob/main/parquet/README.md) +* `CUDA`:[cudf](https://github.com/rapidsai/cudf) Review Comment: Should this say CUDA C++, or CUDA (C++/Python/Java), or something like that to indicate the language bindings from which cudf APIs can be called? ```suggestion * `CUDA`: [cudf](https://github.com/rapidsai/cudf) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
