etseidl commented on code in PR #34: URL: https://github.com/apache/parquet-site/pull/34#discussion_r1633838679
########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +### Physical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| BOOLEAN | | | | | +| INT32 | | | | | +| INT64 | | | | | +| INT96 | | | | | +| FLOAT | | | | | +| DOUBLE | | | | | +| BYTE_ARRAY | | | | | +| FIXED_LEN_BYTE_ARRAY | | | | | + +### Logical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| STRING | | | | | +| ENUM | | | | | +| UUID | | | | | +| 8 and 16 bit signed INT | | | | | +| 8, 16, 32, 64 bit unsigned INT | | | | | +| DECIMAL (INT32) | | | | | +| DECIMAL (INT64) | | | | | +| DECIMAL (BYTE_ARRAY) | | | | | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | +| DATE | | | | | +| TIME (INT32) | | | | | +| TIME (INT64) | | | | | +| TIMESTAMP (INT32) | | | | | +| TIMESTAMP (INT64) | | | | | +| INTERVAL | | | | | +| JSON | | | | | +| BSON | | | | | +| LIST | | | | | +| MAP | | | | | +| UNKNOWN | | | | | + +### Encoding + +| Encoding | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| PLAIN | | | | | +| PLAIN_DICTIONARY | | | | | +| RLE_DICTIONARY | | | | | +| RLE | | | | | +| BIT_PACKED | | | | | +| DELTA_BINARY_PACKED | | | | | +| DELTA_LENGTH_BYTE_ARRAY | | | | | +| DELTA_BYTE_ARRAY | | | | | +| BYTE_STREAM_SPLIT | | | | | Review Comment: Should this be split into float/double and int/fixed_len_byte_array, or just use notes if an implementation doesn't yet support the expanded set of data types? ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +### Physical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| BOOLEAN | | | | | +| INT32 | | | | | +| INT64 | | | | | +| INT96 | | | | | Review Comment: Should it be noted that INT96 is deprecated? ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +### Physical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| BOOLEAN | | | | | +| INT32 | | | | | +| INT64 | | | | | +| INT96 | | | | | +| FLOAT | | | | | +| DOUBLE | | | | | +| BYTE_ARRAY | | | | | +| FIXED_LEN_BYTE_ARRAY | | | | | + +### Logical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| STRING | | | | | +| ENUM | | | | | +| UUID | | | | | +| 8 and 16 bit signed INT | | | | | +| 8, 16, 32, 64 bit unsigned INT | | | | | +| DECIMAL (INT32) | | | | | +| DECIMAL (INT64) | | | | | +| DECIMAL (BYTE_ARRAY) | | | | | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | +| DATE | | | | | +| TIME (INT32) | | | | | +| TIME (INT64) | | | | | +| TIMESTAMP (INT32) | | | | | +| TIMESTAMP (INT64) | | | | | +| INTERVAL | | | | | +| JSON | | | | | +| BSON | | | | | +| LIST | | | | | +| MAP | | | | | +| UNKNOWN | | | | | + +### Encoding + +| Encoding | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| PLAIN | | | | | +| PLAIN_DICTIONARY | | | | | +| RLE_DICTIONARY | | | | | +| RLE | | | | | +| BIT_PACKED | | | | | +| DELTA_BINARY_PACKED | | | | | +| DELTA_LENGTH_BYTE_ARRAY | | | | | +| DELTA_BYTE_ARRAY | | | | | +| BYTE_STREAM_SPLIT | | | | | + +### Compression + +| Compression | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| UNCOMPRESSED | | | | | +| SNAPPY | | | | | +| GZIP | | | | | +| LZO | | | | | +| BROTLI | | | | | +| LZ4 | | | | | +| ZSTD | | | | | +| LZ4_RAW | | | | | + +### Other format level features + +| | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| xxHash Bloom filters | | | | | +| bloom filter length | | | | | +| Statistics min_value, max_value | | | | | +| Column index | | | | | +| Offset index | | | | | +| Modular encryption | | | | | +| Page CRC32 checksum | | | | | +| Modular encryption | | | | | Review Comment: Add Size Statistics (https://github.com/apache/parquet-format/pull/197)? ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +### Physical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| BOOLEAN | | | | | +| INT32 | | | | | +| INT64 | | | | | +| INT96 | | | | | +| FLOAT | | | | | +| DOUBLE | | | | | +| BYTE_ARRAY | | | | | +| FIXED_LEN_BYTE_ARRAY | | | | | + +### Logical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| STRING | | | | | +| ENUM | | | | | +| UUID | | | | | +| 8 and 16 bit signed INT | | | | | +| 8, 16, 32, 64 bit unsigned INT | | | | | +| DECIMAL (INT32) | | | | | +| DECIMAL (INT64) | | | | | +| DECIMAL (BYTE_ARRAY) | | | | | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | +| DATE | | | | | +| TIME (INT32) | | | | | +| TIME (INT64) | | | | | +| TIMESTAMP (INT32) | | | | | +| TIMESTAMP (INT64) | | | | | +| INTERVAL | | | | | +| JSON | | | | | +| BSON | | | | | +| LIST | | | | | +| MAP | | | | | +| UNKNOWN | | | | | + +### Encoding + +| Encoding | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| PLAIN | | | | | +| PLAIN_DICTIONARY | | | | | +| RLE_DICTIONARY | | | | | +| RLE | | | | | +| BIT_PACKED | | | | | +| DELTA_BINARY_PACKED | | | | | +| DELTA_LENGTH_BYTE_ARRAY | | | | | +| DELTA_BYTE_ARRAY | | | | | +| BYTE_STREAM_SPLIT | | | | | + +### Compression + +| Compression | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| UNCOMPRESSED | | | | | +| SNAPPY | | | | | +| GZIP | | | | | +| LZO | | | | | +| BROTLI | | | | | +| LZ4 | | | | | +| ZSTD | | | | | +| LZ4_RAW | | | | | + +### Other format level features + +| | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| xxHash Bloom filters | | | | | +| bloom filter length | | | | | +| Statistics min_value, max_value | | | | | +| Column index | | | | | +| Offset index | | | | | Review Comment: Should these be combined as Page Indexes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org