View types directly [arrow]

via GitHub Thu, 22 May 2025 02:39:40 -0700


pitrou commented on code in PR #46532:
URL: https://github.com/apache/arrow/pull/46532#discussion_r2102109410



##########
cpp/src/parquet/properties.h:
##########
@@ -1032,6 +1035,18 @@ class PARQUET_EXPORT ArrowReaderProperties {
     }
   }
 
+  /// \brief Set the Arrow binary type to read BYTE_ARRAY columns as.
+  ///
+  /// Allowed values are Type::BINARY, Type::LARGE_BINARY and 
Type::BINARY_VIEW.
+  /// Default is Type::BINARY.
+  ///
+  /// If a serialized Arrow schema is found in the Parquet metadata,
+  /// this setting is ignored and the Arrow schema takes precedence
+  /// (see ArrowWriterProperties::store_schema).
+  void set_binary_type(::arrow::Type::type value) { binary_type_ = value; }

Review Comment:
   There are many ways that users may want to direct the formation of the 
output schema, and unfortunately it seems difficult to provide an API that 
would work for all use cases. For example, what if someone wants to rely on the 
stored Arrow schema, but still force all binary columns to binary_view instead?
   
   (that doesn't mean we shouldn't expose more settings, of course)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] GH-43041: [C++][Python] Read/write Parquet BYTE_ARRAY as Large/View types directly [arrow]

Reply via email to