pitrou commented on code in PR #46532: URL: https://github.com/apache/arrow/pull/46532#discussion_r2102109410
########## cpp/src/parquet/properties.h: ########## @@ -1032,6 +1035,18 @@ class PARQUET_EXPORT ArrowReaderProperties { } } + /// \brief Set the Arrow binary type to read BYTE_ARRAY columns as. + /// + /// Allowed values are Type::BINARY, Type::LARGE_BINARY and Type::BINARY_VIEW. + /// Default is Type::BINARY. + /// + /// If a serialized Arrow schema is found in the Parquet metadata, + /// this setting is ignored and the Arrow schema takes precedence + /// (see ArrowWriterProperties::store_schema). + void set_binary_type(::arrow::Type::type value) { binary_type_ = value; } Review Comment: There are many ways that users may want to direct the formation of the output schema, and unfortunately it seems difficult to provide an API that would work for all use cases. For example, what if someone wants to rely on the stored Arrow schema, but still force all binary columns to binary_view instead? (that doesn't mean we shouldn't expose more settings, of course) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org