scovich commented on code in PR #8350:
URL: https://github.com/apache/arrow-rs/pull/8350#discussion_r2357600500


##########
arrow-schema/src/field.rs:
##########
@@ -391,6 +398,49 @@ impl Field {
     /// ```
     pub fn with_data_type(mut self, data_type: DataType) -> Self {
         self.set_data_type(data_type);
+        self.without_extension_type()

Review Comment:
   This change seems dangerous... it will silently strip away the extension 
type information, which can propagate into e.g. the parquet file the column 
eventually gets written to, which could be very hard to fix later on once 
readers start noticing the extension type info is unexpectedly missing.
   
   Seems better to make this fallible, and blow up if there is an extension 
type for which the new data type is incompatible? Could potentially still keep 
the original (infallible) version and panic if the new type is invalid. Doing 
that means people upgrading to arrow-57 won't necessarily realize there's a 
potential problem until their code starts crashing, which I don't love, but I 
suspect some people would rather deal with a potential panic than have to 
rewrite all their `with_data_type` calls? (alternatively -- maybe we leave the 
panicky infallible version in place, but deprecated with an appropriate warning?



##########
arrow-schema/src/field.rs:
##########
@@ -366,7 +366,10 @@ impl Field {
         &self.data_type
     }
 
-    /// Set [`DataType`] of the [`Field`]
+    /// Set [`DataType`] of the [`Field`].
+    ///
+    /// This clears extension type information from this field by calling
+    /// [`Field::clear_extension_type`].

Review Comment:
   ```suggestion
       /// Set [`DataType`] of the [`Field`], removing any extension type 
information in the process.
   ```



##########
arrow-schema/src/field.rs:
##########
@@ -378,10 +381,14 @@ impl Field {
     #[inline]
     pub fn set_data_type(&mut self, data_type: DataType) {
         self.data_type = data_type;
+        self.clear_extension_type();
     }
 
     /// Set [`DataType`] of the [`Field`] and returns self.
     ///
+    /// This clears extension type information from the returned field by
+    /// calling [`Field::without_extension_type`].

Review Comment:
   ```suggestion
       /// Returns a new field with the requested [`DataType`].
       ///
       /// Extension type information (if any) is not propagated.
   ```



##########
arrow-schema/src/field.rs:
##########
@@ -378,10 +381,14 @@ impl Field {
     #[inline]
     pub fn set_data_type(&mut self, data_type: DataType) {
         self.data_type = data_type;
+        self.clear_extension_type();

Review Comment:
   If nobody actually uses `set_data_type`, can we deprecate and remove it? 
   It's kind of a weird API surface, especially with the new extension type 
semantics



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to