pitrou commented on code in PR #46532:
URL: https://github.com/apache/arrow/pull/46532#discussion_r2102109410


##########
cpp/src/parquet/properties.h:
##########
@@ -1032,6 +1035,18 @@ class PARQUET_EXPORT ArrowReaderProperties {
     }
   }
 
+  /// \brief Set the Arrow binary type to read BYTE_ARRAY columns as.
+  ///
+  /// Allowed values are Type::BINARY, Type::LARGE_BINARY and 
Type::BINARY_VIEW.
+  /// Default is Type::BINARY.
+  ///
+  /// If a serialized Arrow schema is found in the Parquet metadata,
+  /// this setting is ignored and the Arrow schema takes precedence
+  /// (see ArrowWriterProperties::store_schema).
+  void set_binary_type(::arrow::Type::type value) { binary_type_ = value; }

Review Comment:
   There are many ways that users may want to direct the formation of the 
output schema, and unfortunately it seems difficult to provide an API that 
would work for all use cases. For example, what if someone wants to rely on the 
stored Arrow schema, but still force all binary columns to binary_view instead?
   
   (that doesn't mean we shouldn't expose more settings, of course)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to