alamb commented on code in PR #34:
URL: https://github.com/apache/parquet-site/pull/34#discussion_r1639591606


##########
content/en/docs/File Format/implementationstatus.md:
##########
@@ -0,0 +1,101 @@
+---
+title: "Implementation status"
+linkTitle: "Implementation status"
+weight: 8
+---
+### Physical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| BOOLEAN                                   |       |        |       |       |
+| INT32                                     |       |        |       |       |
+| INT64                                     |       |        |       |       |
+| INT96                                     |       |        |       |       |
+| FLOAT                                     |       |        |       |       |
+| DOUBLE                                    |       |        |       |       |
+| BYTE_ARRAY                                |       |        |       |       |
+| FIXED_LEN_BYTE_ARRAY                      |       |        |       |       |
+
+### Logical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| STRING                                    |       |        |       |       |
+| ENUM                                      |       |        |       |       |
+| UUID                                      |       |        |       |       |
+| 8 and 16 bit signed INT                   |       |        |       |       |
+| 8, 16, 32, 64 bit unsigned INT            |       |        |       |       |
+| DECIMAL (INT32)                           |       |        |       |       |
+| DECIMAL (INT64)                           |       |        |       |       |
+| DECIMAL (BYTE_ARRAY)                      |       |        |       |       |
+| DECIMAL (FIXED_LEN_BYTE_ARRAY)            |       |        |       |       |
+| DATE                                      |       |        |       |       |
+| TIME (INT32)                              |       |        |       |       |
+| TIME (INT64)                              |       |        |       |       |
+| TIMESTAMP (INT32)                         |       |        |       |       |
+| TIMESTAMP (INT64)                         |       |        |       |       |
+| INTERVAL                                  |       |        |       |       |
+| JSON                                      |       |        |       |       |
+| BSON                                      |       |        |       |       |
+| LIST                                      |       |        |       |       |
+| MAP                                       |       |        |       |       |
+| UNKNOWN                                   |       |        |       |       |
+
+### Encoding
+
+| Encoding                                  | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| PLAIN                                     |       |        |       |       |
+| PLAIN_DICTIONARY                          |       |        |       |       |
+| RLE_DICTIONARY                            |       |        |       |       |
+| RLE                                       |       |        |       |       |
+| BIT_PACKED                                |       |        |       |       |
+| DELTA_BINARY_PACKED                       |       |        |       |       |
+| DELTA_LENGTH_BYTE_ARRAY                   |       |        |       |       |
+| DELTA_BYTE_ARRAY                          |       |        |       |       |
+| BYTE_STREAM_SPLIT                         |       |        |       |       |
+
+### Compression
+
+| Compression                               | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| UNCOMPRESSED                              |       |        |       |       |
+| SNAPPY                                    |       |        |       |       |
+| GZIP                                      |       |        |       |       |
+| LZO                                       |       |        |       |       |
+| BROTLI                                    |       |        |       |       |
+| LZ4                                       |       |        |       |       |
+| ZSTD                                      |       |        |       |       |
+| LZ4_RAW                                   |       |        |       |       |
+
+### Other format level features
+
+|                                           | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| xxHash Bloom filters                      |       |        |       |       |
+| bloom filter length                       |       |        |       |       |
+| Statistics min_value, max_value           |       |        |       |       |
+| Column index                              |       |        |       |       |
+| Offset index                              |       |        |       |       |
+| Modular encryption                        |       |        |       |       |
+| Page CRC32 checksum                       |       |        |       |       |
+| Modular encryption                        |       |        |       |       |
+
+### High level data API-s for parquet feature usage

Review Comment:
   Many of the features listed below need to be implemented in the context of a 
query engine, not just the parquet implementation. 
   
   I personally suggest removing this  section in the first draft to keep 
things simpler as this page already has lots of great things and we can add 
this as a follow on PR
   
   Eventually, I think it would make sense to describe features of the 
reader/writers that enable these features
   
   For example, instead of "row group pruning via statistics"  we might list:
   * Writing 
[`Statitistics`](https://github.com/apache/parquet-format/blob/4227b78c06c23639df8d23c576c0ae92eaf64aff/src/main/thrift/parquet.thrift#L244)
 for all types
   * Reading 
[`Statitistics`](https://github.com/apache/parquet-format/blob/4227b78c06c23639df8d23c576c0ae92eaf64aff/src/main/thrift/parquet.thrift#L244)
 for all types
   * Specifying reading only certain row groups and skipping others



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to