wgtmac commented on code in PR #36027:
URL: https://github.com/apache/arrow/pull/36027#discussion_r1226080301


##########
docs/source/status.rst:
##########
@@ -348,3 +348,107 @@ Notes:
 * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``)
 
 * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by 
``org.apache.arrow:arrow-dataset``)
+
+
+Parquet format public API details
+=================================
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Format                                    | C++   | Python | Java   | Go    
| Rust  |
+|                                           |       |        |        |       
|       |
++===========================================+=======+========+========+=======+=======+
+| Basic compression                         |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Brotli, LZ4, ZSTD                         |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4_RAW                                   |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Hive-style partitioning                   |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| File metadata                             |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup metadata                         |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Column metadata                           |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Chunk metadta                             |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Sorting column                            |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| ColumnIndex statistics                    |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page statistics                           |       |        |        |       
|       |

Review Comment:
   Could we organize these items in a layered fashion? Maybe this is a good 
start point: 
https://arrow.apache.org/docs/cpp/parquet.html#supported-parquet-features



##########
docs/source/status.rst:
##########
@@ -348,3 +348,107 @@ Notes:
 * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``)
 
 * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by 
``org.apache.arrow:arrow-dataset``)
+
+
+Parquet format public API details
+=================================
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Format                                    | C++   | Python | Java   | Go    
| Rust  |
+|                                           |       |        |        |       
|       |
++===========================================+=======+========+========+=======+=======+
+| Basic compression                         |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Brotli, LZ4, ZSTD                         |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4_RAW                                   |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Hive-style partitioning                   |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| File metadata                             |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup metadata                         |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Column metadata                           |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+

Review Comment:
   Are these intended for the completeness of fields defined in the metadata? 
If yes, probably they worth a separate table and indicate the states of each 
field. But that sounds too complicated.



##########
docs/source/status.rst:
##########
@@ -348,3 +348,107 @@ Notes:
 * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``)
 
 * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by 
``org.apache.arrow:arrow-dataset``)
+
+
+Parquet format public API details
+=================================
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Format                                    | C++   | Python | Java   | Go    
| Rust  |
+|                                           |       |        |        |       
|       |
++===========================================+=======+========+========+=======+=======+
+| Basic compression                         |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Brotli, LZ4, ZSTD                         |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4_RAW                                   |       |        |        |       
|       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Hive-style partitioning                   |       |        |        |       
|       |

Review Comment:
   I agree with @tustvold, `partitioning` is more like a high-level use case on 
top of file format.



##########
docs/source/status.rst:
##########
@@ -348,3 +348,107 @@ Notes:
 * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``)
 
 * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by 
``org.apache.arrow:arrow-dataset``)
+
+
+Parquet format public API details
+=================================
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Format                                    | C++   | Python | Java   | Go    
| Rust  |

Review Comment:
   The `Java` column could be misleading here. In the arrow repo, there is a 
java dataset reader to support reading from parquet dataset. If this is for 
parquet-mr, then it can be easily out of sync.



##########
docs/source/status.rst:
##########
@@ -348,3 +348,107 @@ Notes:
 * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``)
 
 * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by 
``org.apache.arrow:arrow-dataset``)
+
+
+Parquet format public API details
+=================================
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Format                                    | C++   | Python | Java   | Go    
| Rust  |
+|                                           |       |        |        |       
|       |
++===========================================+=======+========+========+=======+=======+
+| Basic compression                         |       |        |        |       
|       |

Review Comment:
   +1 for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to