crystalxyz commented on code in PR #15646:
URL: https://github.com/apache/datafusion/pull/15646#discussion_r2058241720


##########
docs/source/library-user-guide/upgrading.md:
##########
@@ -19,6 +19,33 @@
 
 # Upgrade Guides
 
+## DataFusion `48.0.0`
+
+### Processing `Field` instead of `DataType` for user defined functions
+
+In order to support metadata handling and extension types, user defined 
functions are
+now switching to traits which use `Field` rather than a `DataType` and 
nullability.
+This gives a single interface to both of these parameters and additionally 
allows
+access to metadata fields, which can be used for extension types.
+
+To upgrade structs which implement `ScalarUDFImpl`, if you have implemented
+`return_type_from_args` you need instead to implement `return_field_from_args`.
+If your functions do not need to handle metadata, this should be 
straightforward
+repackaging of the output data into a `Field`. The name you specify on the
+field is not important. It will be overwritten during planning. `ReturnInfo`

Review Comment:
   Just wondering, it seems a little weird to me here that we ask users to 
provide a name value but then overwrite it. Is it considered acceptable 
(because it is part of arrow's Field class), or could we maybe hide this field 
from the user?



##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -71,11 +71,23 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + 
DynEq + DynHash {
     /// downcast to a specific implementation.
     fn as_any(&self) -> &dyn Any;
     /// Get the data type of this expression, given the schema of the input
-    fn data_type(&self, input_schema: &Schema) -> Result<DataType>;
+    fn data_type(&self, input_schema: &Schema) -> Result<DataType> {
+        Ok(self.return_field(input_schema)?.data_type().to_owned())
+    }
     /// Determine whether this expression is nullable, given the schema of the 
input
-    fn nullable(&self, input_schema: &Schema) -> Result<bool>;
+    fn nullable(&self, input_schema: &Schema) -> Result<bool> {
+        Ok(self.return_field(input_schema)?.is_nullable())
+    }
     /// Evaluate an expression against a RecordBatch
     fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue>;
+    /// The output field associated with this expression
+    fn return_field(&self, input_schema: &Schema) -> Result<Field> {
+        Ok(Field::new(
+            format!("{self}"),

Review Comment:
   If I understand correctly, this is one of the places where we overwrite the 
name of the field?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to