scovich commented on code in PR #8350: URL: https://github.com/apache/arrow-rs/pull/8350#discussion_r2357600500
########## arrow-schema/src/field.rs: ########## @@ -391,6 +398,49 @@ impl Field { /// ``` pub fn with_data_type(mut self, data_type: DataType) -> Self { self.set_data_type(data_type); + self.without_extension_type() Review Comment: This change seems dangerous... it will silently strip away the extension type information, which can propagate into e.g. the parquet file the column eventually gets written to, which could be very hard to fix later on once readers start noticing the extension type info is unexpectedly missing. Seems better to make this fallible, and blow up if there is an extension type for which the new data type is incompatible? Could potentially still keep the original (infallible) version and panic if the new type is invalid. Doing that means people upgrading to arrow-57 won't necessarily realize there's a potential problem until their code starts crashing, which I don't love, but I suspect some people would rather deal with a potential panic than have to rewrite all their `with_data_type` calls? (alternatively -- maybe we leave the panicky infallible version in place, but deprecated with an appropriate warning? ########## arrow-schema/src/field.rs: ########## @@ -366,7 +366,10 @@ impl Field { &self.data_type } - /// Set [`DataType`] of the [`Field`] + /// Set [`DataType`] of the [`Field`]. + /// + /// This clears extension type information from this field by calling + /// [`Field::clear_extension_type`]. Review Comment: ```suggestion /// Set [`DataType`] of the [`Field`], removing any extension type information in the process. ``` ########## arrow-schema/src/field.rs: ########## @@ -378,10 +381,14 @@ impl Field { #[inline] pub fn set_data_type(&mut self, data_type: DataType) { self.data_type = data_type; + self.clear_extension_type(); } /// Set [`DataType`] of the [`Field`] and returns self. /// + /// This clears extension type information from the returned field by + /// calling [`Field::without_extension_type`]. Review Comment: ```suggestion /// Returns a new field with the requested [`DataType`]. /// /// Extension type information (if any) is not propagated. ``` ########## arrow-schema/src/field.rs: ########## @@ -378,10 +381,14 @@ impl Field { #[inline] pub fn set_data_type(&mut self, data_type: DataType) { self.data_type = data_type; + self.clear_extension_type(); Review Comment: If nobody actually uses `set_data_type`, can we deprecate and remove it? It's kind of a weird API surface, especially with the new extension type semantics -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org