alamb commented on code in PR #14837: URL: https://github.com/apache/datafusion/pull/14837#discussion_r2083479793
########## datafusion/core/src/physical_planner.rs: ########## @@ -775,12 +776,44 @@ impl DefaultPhysicalPlanner { let runtime_expr = self.create_physical_expr(predicate, input_dfschema, session_state)?; + + let filter = match self.try_plan_async_exprs( Review Comment: I think at a really high level this pattern is basically the same as the ["Common Subexpression Elimination"](https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/common_subexpr_eliminate.rs) and many of the other optimizer passes -- that is pulling some subset of the expressions into a new node, and rewriting the others. If we want to avoid having to follow the same model I think we could follow the model of some of the other recent optimizer passes and add a method to `ExecutionPlan` -- something like this perhaps ```rust trait ExecutionPlan { /// Factor all async expressions in this ExecutionPlan from any internal expressions /// returning a list of such Async expressions and the rewritten plan /// /// The async expression values will be provided to the rewritten plan after all the existing /// input columns rewrite_async(&self) -> Transformed<(Vec<AsyncExpr>, Arc<dyn ExecutionPlan>) -> { // default to not supporting async functins Transformed::no() } } ``` ########## datafusion-examples/examples/async_udf.rs: ########## @@ -0,0 +1,256 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use arrow::array::{ArrayIter, ArrayRef, AsArray, Int64Array, RecordBatch, StringArray}; +use arrow::compute::kernels::cmp::eq; +use arrow_schema::{DataType, Field, Schema}; +use async_trait::async_trait; +use datafusion::common::error::Result; +use datafusion::common::internal_err; +use datafusion::common::types::{logical_int64, logical_string}; +use datafusion::common::utils::take_function_args; +use datafusion::config::ConfigOptions; +use datafusion::execution::{FunctionRegistry, SessionStateBuilder}; +use datafusion::logical_expr::async_udf::{ + AsyncScalarFunctionArgs, AsyncScalarUDF, AsyncScalarUDFImpl, +}; +use datafusion::logical_expr::{ + ColumnarValue, Signature, TypeSignature, TypeSignatureClass, Volatility, +}; +use datafusion::logical_expr_common::signature::Coercion; +use datafusion::physical_expr_common::datum::apply_cmp; +use datafusion::prelude::SessionContext; +use log::trace; +use std::any::Any; +use std::sync::Arc; + +#[tokio::main] Review Comment: It would be nice to add some high level context to this example -- like an introduction saying that most functions are sync, but for some functions can be run as async ... I can help with this potentially. It would also be awesome to put this example / code in the docs https://datafusion.apache.org/library-user-guide/adding-udfs.html so it was easier to find -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org