alippai commented on code in PR #36027: URL: https://github.com/apache/arrow/pull/36027#discussion_r1226064118
########## docs/source/status.rst: ########## @@ -348,3 +348,107 @@ Notes: * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``) * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by ``org.apache.arrow:arrow-dataset``) + + +Parquet format public API details +================================= + ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Format | C++ | Python | Java | Go | Rust | +| | | | | | | ++===========================================+=======+========+========+=======+=======+ +| Basic compression | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Brotli, LZ4, ZSTD | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| LZ4_RAW | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Hive-style partitioning | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| File metadata | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RowGroup metadata | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Column metadata | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Chunk metadta | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Sorting column | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| ColumnIndex statistics | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page statistics | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Statistics min_value | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| xxHash based bloom filter | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| bloom filter length | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Modular encryption | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| External column data | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Nanosecond support | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| FIXED_LEN_BYTE_ARRAY | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Complete Delta encoding support | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Complete RLE support | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| BYTE_STREAM_SPLIT | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Partition pruning on the partition column | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RowGroup pruning using statistics | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RowGroup pruning using bloom filter | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page pruning using projection pushdown | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page pruning using statistics | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page pruning using bloom filter | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Partition append / delete | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| RowGroup append / delete | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page append / delete | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Page CRC32 checksum | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Parallel partition processing | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Parallel RowGroup processing | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Parallel Page processing | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Storage-aware defaults (1) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Adaptive concurrency (2) | | | | | | ++-------------------------------------------+-------+--------+--------+-------+-------+ +| Adaptive IO when pruning used (3) | | | | | | Review Comment: I wanted to capture the IO pushdown section https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/#io-pushdown but also added more. Likely out of scope as none of the implementations goes into details or provides an API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
