alamb commented on code in PR #9304:
URL: https://github.com/apache/arrow-datafusion/pull/9304#discussion_r1504135026


##########
datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:
##########
@@ -498,6 +503,83 @@ async fn test_user_defined_functions_zero_argument() -> 
Result<()> {
     Ok(())
 }
 
+#[derive(Debug)]
+struct CastToI64UDF {
+    signature: Signature,
+}
+
+impl CastToI64UDF {
+    fn new() -> Self {
+        Self {
+            signature: Signature::any(1, Volatility::Immutable),
+        }
+    }
+}
+
+impl ScalarUDFImpl for CastToI64UDF {
+    fn as_any(&self) -> &dyn Any {
+        self
+    }
+    fn name(&self) -> &str {
+        "cast_to_i64"
+    }
+    fn signature(&self) -> &Signature {
+        &self.signature
+    }
+    fn return_type(&self, _args: &[DataType]) -> Result<DataType> {
+        Ok(DataType::Int64)
+    }
+    // Wrap with Expr::Cast() to Int64
+    fn simplify(&self, args: &[Expr]) -> Result<Simplified> {
+        let dfs = DFSchema::new_with_metadata(

Review Comment:
   This is a great example, thank you @jayzhan211 
   
   It seems to me that the UDF can't always know what the input schema would 
be, and thus the schema should be passed in.
   
   Something like this
   
   ```rust
       fn simplify(&self, args: &[Expr], schema: &DFSchema) -> 
Result<Simplified> {
   ...
        }
   ```
   
   What do you think?
   



##########
datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:
##########
@@ -498,6 +503,83 @@ async fn test_user_defined_functions_zero_argument() -> 
Result<()> {
     Ok(())
 }
 
+#[derive(Debug)]
+struct CastToI64UDF {
+    signature: Signature,
+}
+
+impl CastToI64UDF {
+    fn new() -> Self {
+        Self {
+            signature: Signature::any(1, Volatility::Immutable),
+        }
+    }
+}
+
+impl ScalarUDFImpl for CastToI64UDF {
+    fn as_any(&self) -> &dyn Any {
+        self
+    }
+    fn name(&self) -> &str {
+        "cast_to_i64"
+    }
+    fn signature(&self) -> &Signature {
+        &self.signature
+    }
+    fn return_type(&self, _args: &[DataType]) -> Result<DataType> {
+        Ok(DataType::Int64)
+    }
+    // Wrap with Expr::Cast() to Int64
+    fn simplify(&self, args: &[Expr]) -> Result<Simplified> {
+        let dfs = DFSchema::new_with_metadata(
+            vec![DFField::new(Some("t"), "x", DataType::Float32, true)],
+            HashMap::default(),
+        )?;
+        let e = args[0].to_owned();
+        let casted_expr = e.cast_to(&DataType::Int64, &dfs)?;
+        Ok(Simplified::Rewritten(casted_expr))
+    }
+    // Casting should be done in `simplify`, so we just return the first 
argument
+    fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
+        assert_eq!(args.len(), 1);
+        Ok(args.first().unwrap().clone())
+    }
+}
+
+#[tokio::test]
+async fn test_user_defined_functions_cast_to_i64() -> Result<()> {

Review Comment:
   Thank you so much for this test / example -- it makes seeing how the API 
would work really clear. 👏 



##########
datafusion/expr/src/udf.rs:
##########
@@ -337,6 +353,11 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
     fn monotonicity(&self) -> Result<Option<FuncMonotonicity>> {
         Ok(None)
     }
+
+    // Do the function rewrite
+    fn simplify(&self, _args: &[Expr]) -> Result<Simplified> {
+        Ok(Simplified::Original)
+    }

Review Comment:
   Is it possible to make this take `self` (and avoid a copy)? Something like
   
   ```suggestion
       fn simplify(self, _args: &[Expr]) -> Result<Simplified> {
           Ok(Simplified::Original(self))
       }
   ```?
   



##########
datafusion/optimizer/src/analyzer/rewrite_expr.rs:
##########
@@ -319,3 +320,67 @@ fn rewrite_array_concat_operator_to_func_for_column(
         _ => Ok(None),
     }
 }
+
+#[derive(Default)]
+pub struct FunctionSimplifiction {}
+
+impl FunctionSimplifiction {
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl AnalyzerRule for FunctionSimplifiction {
+    fn name(&self) -> &str {
+        "function_simplification"
+    }
+
+    fn analyze(&self, plan: LogicalPlan, _: &ConfigOptions) -> 
Result<LogicalPlan> {
+        function_simplication_analyze_internal(&plan)
+    }
+}
+
+fn function_simplication_analyze_internal(plan: &LogicalPlan) -> 
Result<LogicalPlan> {

Review Comment:
   I wonder if there is any reason calling simplify needs to be its own 
analyzer pass? Can't it be called as part of the existing simplification code?
   
   ```
   logical_plan after simplify_expressions SAME TEXT AS ABOVE
   ```
   
   That would also allow this function to be used by anyone else that used the 
simplification API directly: 
https://github.com/apache/arrow-datafusion/blob/8f3d1ef23f93cd4303745eba76c0850b39774d07/datafusion-examples/examples/expr_api.rs#L113



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to