pitrou opened a new pull request, #46532: URL: https://github.com/apache/arrow/pull/46532
### Rationale for this change Parquet has almost no support for LargeBinary and BinaryView data: * on writing, those types are not supported at all * on reading, data is decoded as regular Binary data with automatic chunking; if the stored Arrow schema points to a LargeBinary field, the data is later cast to that type ### What changes are included in this PR? * Refactor the BYTE_ARRAY column decoders to allow decoding directly into a LargeBinaryBuilder or a BinaryViewBuilder * Add a `binary_type` setting to `ArrowReaderProperties` to change the type that BYTE_ARRAY columns are decoded to by default * Support reading Parquet GEOMETRY types with a LargeBinary or BinaryView storage * Add benchmarks for reading and writing BinaryView data from/to Parquet * Add the corresponding Python bindings ### Are these changes tested? Yes. ### Are there any user-facing changes? New APIs and improved functionality. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org