crystalxyz commented on code in PR #15646: URL: https://github.com/apache/datafusion/pull/15646#discussion_r2058241720
########## docs/source/library-user-guide/upgrading.md: ########## @@ -19,6 +19,33 @@ # Upgrade Guides +## DataFusion `48.0.0` + +### Processing `Field` instead of `DataType` for user defined functions + +In order to support metadata handling and extension types, user defined functions are +now switching to traits which use `Field` rather than a `DataType` and nullability. +This gives a single interface to both of these parameters and additionally allows +access to metadata fields, which can be used for extension types. + +To upgrade structs which implement `ScalarUDFImpl`, if you have implemented +`return_type_from_args` you need instead to implement `return_field_from_args`. +If your functions do not need to handle metadata, this should be straightforward +repackaging of the output data into a `Field`. The name you specify on the +field is not important. It will be overwritten during planning. `ReturnInfo` Review Comment: Just wondering, it seems a little weird to me here that we ask users to provide a name value but then overwrite it. Is it considered acceptable (because it is part of arrow's Field class), or could we maybe hide this field from the user? ########## datafusion/physical-expr-common/src/physical_expr.rs: ########## @@ -71,11 +71,23 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash { /// downcast to a specific implementation. fn as_any(&self) -> &dyn Any; /// Get the data type of this expression, given the schema of the input - fn data_type(&self, input_schema: &Schema) -> Result<DataType>; + fn data_type(&self, input_schema: &Schema) -> Result<DataType> { + Ok(self.return_field(input_schema)?.data_type().to_owned()) + } /// Determine whether this expression is nullable, given the schema of the input - fn nullable(&self, input_schema: &Schema) -> Result<bool>; + fn nullable(&self, input_schema: &Schema) -> Result<bool> { + Ok(self.return_field(input_schema)?.is_nullable()) + } /// Evaluate an expression against a RecordBatch fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue>; + /// The output field associated with this expression + fn return_field(&self, input_schema: &Schema) -> Result<Field> { + Ok(Field::new( + format!("{self}"), Review Comment: If I understand correctly, this is one of the places where we overwrite the name of the field? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org