wgtmac commented on code in PR #36027: URL: https://github.com/apache/arrow/pull/36027#discussion_r1226080301
########## docs/source/status.rst: ########## @@ -348,3 +348,107 @@ Notes: * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``) * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by ``org.apache.arrow:arrow-dataset``) + + +Parquet format public API details +================================= + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Format | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| Basic compression | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Brotli, LZ4, ZSTD | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4_RAW | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Hive-style partitioning | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| File metadata | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RowGroup metadata | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Column metadata | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Chunk metadta | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Sorting column | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ColumnIndex statistics | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page statistics | | | | | | Review Comment: Could we organize these items in a layered fashion? Maybe this is a good start point: https://arrow.apache.org/docs/cpp/parquet.html#supported-parquet-features ########## docs/source/status.rst: ########## @@ -348,3 +348,107 @@ Notes: * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``) * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by ``org.apache.arrow:arrow-dataset``) + + +Parquet format public API details +================================= + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Format | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| Basic compression | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Brotli, LZ4, ZSTD | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4_RAW | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Hive-style partitioning | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| File metadata | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RowGroup metadata | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Column metadata | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ Review Comment: Are these intended for the completeness of fields defined in the metadata? If yes, probably they worth a separate table and indicate the states of each field. But that sounds too complicated. ########## docs/source/status.rst: ########## @@ -348,3 +348,107 @@ Notes: * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``) * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by ``org.apache.arrow:arrow-dataset``) + + +Parquet format public API details +================================= + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Format | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| Basic compression | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Brotli, LZ4, ZSTD | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4_RAW | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Hive-style partitioning | | | | | | Review Comment: I agree with @tustvold, `partitioning` is more like a high-level use case on top of file format. ########## docs/source/status.rst: ########## @@ -348,3 +348,107 @@ Notes: * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``) * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by ``org.apache.arrow:arrow-dataset``) + + +Parquet format public API details +================================= + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Format | C++ | Python | Java | Go | Rust | Review Comment: The `Java` column could be misleading here. In the arrow repo, there is a java dataset reader to support reading from parquet dataset. If this is for parquet-mr, then it can be easily out of sync. ########## docs/source/status.rst: ########## @@ -348,3 +348,107 @@ Notes: * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``) * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by ``org.apache.arrow:arrow-dataset``) + + +Parquet format public API details +================================= + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Format | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| Basic compression | | | | | | Review Comment: +1 for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
