timsaucer commented on code in PR #22705:
URL: https://github.com/apache/datafusion/pull/22705#discussion_r3386981785


##########
.github/workflows/dependencies.yml:
##########
@@ -39,7 +39,7 @@ jobs:
     name: Circular Dependency Check
     runs-on: ubuntu-latest
     container:
-      image: amd64/rust

Review Comment:
   Why are these changes necessary? They seem unrelated to the current PR.



##########
datafusion/ffi/src/udwf/expression_args.rs:
##########
@@ -0,0 +1,89 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::sync::Arc;
+
+use arrow::ffi::FFI_ArrowSchema;
+use arrow_schema::FieldRef;
+use datafusion_common::{DataFusionError, Result};
+use datafusion_expr::function::ExpressionArgs;
+use datafusion_physical_expr::PhysicalExpr;
+use stabby::vec::Vec as SVec;
+
+use crate::arrow_wrappers::WrappedSchema;
+use crate::physical_expr::FFI_PhysicalExpr;
+use crate::util::rvec_wrapped_to_vec_fieldref;
+
+/// A stable struct for sharing [`ExpressionArgs`] across FFI boundaries.
+#[repr(C)]
+#[derive(Debug)]
+pub struct FFI_ExpressionArgs {
+    input_exprs: SVec<FFI_PhysicalExpr>,
+    input_fields: SVec<WrappedSchema>,
+}
+
+impl TryFrom<ExpressionArgs<'_>> for FFI_ExpressionArgs {
+    type Error = DataFusionError;
+
+    fn try_from(args: ExpressionArgs) -> Result<Self> {
+        let input_exprs = args
+            .input_exprs()
+            .iter()
+            .map(Arc::clone)
+            .map(FFI_PhysicalExpr::from)
+            .collect();
+
+        let input_fields = args
+            .input_fields()
+            .iter()
+            .map(|field| 
FFI_ArrowSchema::try_from(field.as_ref()).map(WrappedSchema))
+            .collect::<Result<Vec<_>, _>>()?
+            .into_iter()
+            .collect();
+
+        Ok(Self {
+            input_exprs,
+            input_fields,
+        })
+    }
+}
+
+pub struct ForeignExpressionArgs {
+    input_exprs: Vec<Arc<dyn PhysicalExpr>>,
+    input_fields: Vec<FieldRef>,
+}

Review Comment:
   Why have a `ForeignExpressionArgs`? We aren't implementing a trait, which is 
where that pattern is needed. This seems like an unnecessary layer. Is it 
because `ExpressionArgs` requires a borrow and we need an owned struct?
   
   If that's the case then I think we should follow the naming convention like 
`ForeignReturnFieldArgsOwned` because these expression arguments are not 
actually foreign at this point. If you're using an agent and it generated this 
naming convention due to the FFI skill in our repository, then it would be good 
to update the skill to reflect the fact that it missed this convention.



##########
datafusion/ffi/src/udwf/mod.rs:
##########
@@ -74,17 +81,35 @@ pub struct FFI_WindowUDF {
         display_name: SString,
     ) -> FFI_Result<WrappedSchema>,
 
-    /// Performs type coercion. To simply this interface, all UDFs are treated 
as having
-    /// user defined signatures, which will in turn call coerce_types to be 
called. This
-    /// call should be transparent to most users as the internal function 
performs the
-    /// appropriate calls on the underlying [`WindowUDF`]
+    /// Pointer lifetime is tied to the inner Arc; null = None
+    pub documentation: unsafe extern "C" fn(udwf: &Self) -> *const 
Documentation,

Review Comment:
   I was going to comment on this but it looks like Copilot has identified the 
same problem. can you address the above comment?



##########
datafusion/ffi/src/udwf/mod.rs:
##########
@@ -74,17 +81,35 @@ pub struct FFI_WindowUDF {
         display_name: SString,
     ) -> FFI_Result<WrappedSchema>,
 
-    /// Performs type coercion. To simply this interface, all UDFs are treated 
as having
-    /// user defined signatures, which will in turn call coerce_types to be 
called. This
-    /// call should be transparent to most users as the internal function 
performs the
-    /// appropriate calls on the underlying [`WindowUDF`]
+    /// Pointer lifetime is tied to the inner Arc; null = None
+    pub documentation: unsafe extern "C" fn(udwf: &Self) -> *const 
Documentation,
+
+    /// Returns expressions in the same order as input_exprs
+    pub expressions: unsafe extern "C" fn(
+        udwf: &Self,
+        args: FFI_ExpressionArgs,
+    ) -> SVec<FFI_PhysicalExpr>,
+
+    /// Serializes WindowFunction via DefaultLogicalExtensionCodec;
+    /// returns None variant if no simplification; only called when 
has_simplify=true

Review Comment:
   We definitely do not want to use the default logical extension codec here. 



##########
datafusion/ffi/src/udwf/mod.rs:
##########
@@ -358,6 +488,83 @@ impl WindowUDFImpl for ForeignWindowUDF {
         }
     }
 
+    fn documentation(&self) -> Option<&Documentation> {
+        unsafe {
+            let ptr = (self.udf.documentation)(&self.udf);
+            ptr.as_ref()
+        }
+    }
+
+    fn expressions(
+        &self,
+        expr_args: datafusion_expr::function::ExpressionArgs,
+    ) -> Vec<Arc<dyn PhysicalExpr>> {
+        unsafe {
+            let fallback = expr_args.input_exprs().to_vec();
+            let args = match FFI_ExpressionArgs::try_from(expr_args) {
+                Ok(args) => args,
+                Err(_) => return fallback,
+            };
+            (self.udf.expressions)(&self.udf, args)
+                .into_iter()
+                .map(|e| Arc::<dyn PhysicalExpr>::from(&e))
+                .collect()
+        }
+    }
+
+    fn simplify(&self) -> Option<WindowFunctionSimplification> {
+        if !self.udf.has_simplify {
+            return None;
+        }
+
+        let udf = self.udf.clone();
+        Some(Box::new(move |wf, info| {
+            let codec = 
datafusion_proto::logical_plan::DefaultLogicalExtensionCodec {};
+
+            // To serialize the window function
+            let expr = Expr::WindowFunction(Box::new(wf));
+            let protobuf = 
datafusion_proto::logical_plan::to_proto::serialize_expr(
+                &expr, &codec,
+            )

Review Comment:
   I'm unsure why we're doing this serialization to cross the boundary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to