alippai commented on code in PR #34: URL: https://github.com/apache/parquet-site/pull/34#discussion_r1236161855
########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,178 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- + +### Physical types + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Data type | C++ | Python | Java | Go | Rust | Review Comment: Actually the UUID is good counter example indeed :) ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,178 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- + +### Physical types + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Data type | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| BOOLEAN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT32 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT64 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT96 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| FLOAT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DOUBLE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| FIXED_LEN_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Logical types + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Data type | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| STRING | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ENUM | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| UUID | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| 8 and 16 bit signed INT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| 8, 16, 32, 64 bit unsigned INT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (BYTE_ARRAY) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DATE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIME (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIME (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIMESTAMP (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIMESTAMP (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INTERVAL | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| JSON | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BSON | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LIST | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| MAP | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| UNKNOWN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Encoding + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| PLAIN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| PLAIN_DICTIONARY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RLE_DICTIONARY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RLE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BIT_PACKED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_BINARY_PACKED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_LENGTH_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BYTE_STREAM_SPLIT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Compression + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| UNCOMPRESSED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| SNAPPY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| GZIP | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZO | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BROTLI | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ZSTD | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4_RAW | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Other format level features + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| xxHash Bloom filters | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| bloom filter length | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Statistics min_value, max_value | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Column index | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Offset index | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Modular encryption | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page CRC32 checksum | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Modular encryption | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +High level data API-s for parquet feature usage +=============================================== + ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Format | C++ | Python | Java | Go | Rust | +| | | | | | | ++==============================================+=======+========+========+=======+=======+ +| Hive-style partitioning | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Partition pruning on the partition column | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| External column data | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| RowGroup Sorting column | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Read / Write RowGroup metadata and data (1) | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| RowGroup pruning using statistics | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Read / Write page metadata and data (2) | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Page pruning using projection pushdown | | | | | | Review Comment: Yes, that's column selection. I think this is the academic language also the Apache Arrow blog refers to that under the same name: https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/#projection-pushdown ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,178 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- + +### Physical types + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Data type | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| BOOLEAN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT32 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT64 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT96 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| FLOAT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DOUBLE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| FIXED_LEN_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Logical types + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Data type | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| STRING | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ENUM | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| UUID | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| 8 and 16 bit signed INT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| 8, 16, 32, 64 bit unsigned INT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (BYTE_ARRAY) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DATE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIME (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIME (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIMESTAMP (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIMESTAMP (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INTERVAL | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| JSON | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BSON | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LIST | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| MAP | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| UNKNOWN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Encoding + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| PLAIN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| PLAIN_DICTIONARY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RLE_DICTIONARY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RLE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BIT_PACKED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_BINARY_PACKED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_LENGTH_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BYTE_STREAM_SPLIT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Compression + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| UNCOMPRESSED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| SNAPPY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| GZIP | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZO | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BROTLI | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ZSTD | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4_RAW | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Other format level features + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| xxHash Bloom filters | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| bloom filter length | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Statistics min_value, max_value | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Column index | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Offset index | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Modular encryption | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page CRC32 checksum | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Modular encryption | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +High level data API-s for parquet feature usage +=============================================== + ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Format | C++ | Python | Java | Go | Rust | +| | | | | | | ++==============================================+=======+========+========+=======+=======+ +| Hive-style partitioning | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Partition pruning on the partition column | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| External column data | | | | | | Review Comment: https://github.com/apache/parquet-format/blob/1603152f8991809e8ad29659dffa224b4284f31b/src/main/thrift/parquet.thrift#L789 ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,178 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- + +### Physical types + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Data type | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| BOOLEAN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT32 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT64 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT96 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| FLOAT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DOUBLE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| FIXED_LEN_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Logical types + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Data type | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| STRING | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ENUM | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| UUID | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| 8 and 16 bit signed INT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| 8, 16, 32, 64 bit unsigned INT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (BYTE_ARRAY) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DATE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIME (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIME (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIMESTAMP (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIMESTAMP (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INTERVAL | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| JSON | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BSON | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LIST | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| MAP | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| UNKNOWN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Encoding + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| PLAIN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| PLAIN_DICTIONARY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RLE_DICTIONARY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RLE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BIT_PACKED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_BINARY_PACKED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_LENGTH_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BYTE_STREAM_SPLIT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Compression + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| UNCOMPRESSED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| SNAPPY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| GZIP | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZO | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BROTLI | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ZSTD | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4_RAW | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Other format level features + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| xxHash Bloom filters | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| bloom filter length | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Statistics min_value, max_value | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Column index | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Offset index | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Modular encryption | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page CRC32 checksum | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Modular encryption | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +High level data API-s for parquet feature usage +=============================================== + ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Format | C++ | Python | Java | Go | Rust | +| | | | | | | ++==============================================+=======+========+========+=======+=======+ +| Hive-style partitioning | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Partition pruning on the partition column | | | | | | Review Comment: The number of systems which support parquet, but not hive style partitioning is very limited. As in the intro I've explained my goal was to emphasize the different capabilities and levels of abstraction of the different implenetations. If we are strict about what's in the parquet format and ignore all the high level capabilities eg the pyarrow matches about 0% of the format as it can't access, manipulate or create parquet without the intermediate arrow format (unlike in Java and others) ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,178 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- + +### Physical types + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Data type | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| BOOLEAN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT32 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT64 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT96 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| FLOAT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DOUBLE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| FIXED_LEN_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Logical types + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Data type | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| STRING | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ENUM | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| UUID | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| 8 and 16 bit signed INT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| 8, 16, 32, 64 bit unsigned INT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (BYTE_ARRAY) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DATE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIME (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIME (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIMESTAMP (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIMESTAMP (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INTERVAL | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| JSON | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BSON | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LIST | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| MAP | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| UNKNOWN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Encoding + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| PLAIN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| PLAIN_DICTIONARY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RLE_DICTIONARY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RLE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BIT_PACKED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_BINARY_PACKED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_LENGTH_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BYTE_STREAM_SPLIT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Compression + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| UNCOMPRESSED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| SNAPPY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| GZIP | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZO | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BROTLI | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ZSTD | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4_RAW | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Other format level features + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| xxHash Bloom filters | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| bloom filter length | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Statistics min_value, max_value | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Column index | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Offset index | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Modular encryption | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page CRC32 checksum | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Modular encryption | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +High level data API-s for parquet feature usage +=============================================== + ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Format | C++ | Python | Java | Go | Rust | +| | | | | | | ++==============================================+=======+========+========+=======+=======+ +| Hive-style partitioning | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Partition pruning on the partition column | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| External column data | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| RowGroup Sorting column | | | | | | Review Comment: https://github.com/apache/parquet-format/blob/1603152f8991809e8ad29659dffa224b4284f31b/src/main/thrift/parquet.thrift#L834 ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,178 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- + +### Physical types + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Data type | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| BOOLEAN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT32 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT64 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INT96 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| FLOAT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DOUBLE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| FIXED_LEN_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Logical types + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Data type | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| STRING | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ENUM | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| UUID | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| 8 and 16 bit signed INT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| 8, 16, 32, 64 bit unsigned INT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (BYTE_ARRAY) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DATE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIME (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIME (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIMESTAMP (INT32) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| TIMESTAMP (INT64) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| INTERVAL | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| JSON | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BSON | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LIST | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| MAP | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| UNKNOWN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Encoding + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| PLAIN | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| PLAIN_DICTIONARY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RLE_DICTIONARY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RLE | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BIT_PACKED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_BINARY_PACKED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_LENGTH_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| DELTA_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BYTE_STREAM_SPLIT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Compression + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| UNCOMPRESSED | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| SNAPPY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| GZIP | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZO | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BROTLI | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4 | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ZSTD | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4_RAW | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +### Other format level features + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| xxHash Bloom filters | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| bloom filter length | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Statistics min_value, max_value | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Column index | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Offset index | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Modular encryption | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page CRC32 checksum | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Modular encryption | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ + +High level data API-s for parquet feature usage +=============================================== + ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Format | C++ | Python | Java | Go | Rust | +| | | | | | | ++==============================================+=======+========+========+=======+=======+ +| Hive-style partitioning | | | | | | ++----------------------------------------------+-------+--------+--------+-------+-------+ +| Partition pruning on the partition column | | | | | | Review Comment: (I don't feel very strongly about this, but also we already have doxygen/sphinx and other API docs which are really good, so... yeah, maybe it's better to have it in a blogpost comparing / demoing the features) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
