xinlifoobar commented on issue #11413: URL: https://github.com/apache/datafusion/issues/11413#issuecomment-2230502588
Sorry it takes longer than I expected to make this works end-to-end. From my perspective, Good points: - Provide uniform way to implement functions against record batches. - Code saving. Bad points: - Due to the macro implementation, the `global_registry` features needs to be defined at the crate that references `arrow-udf`. otherwise, it would not work. - Difficult to leverage arrow infrastructures projects like `arrow-string` or `arrow-ord`. - Lack of support for operations against array and scalar. - By default all udf are private, lack of a way to reference the udf that could be used in e.g., `ExprPlanner`. Neural: - The `arrow-udf` interfaces are targeting `RecordBatch` and `Field` while `Datafusion` uses `ColumnarValue` and `Datatype`. I'd vote for both implementations but thought of `RecordBatch` are more nature abstraction while take advantages of `arrow`. - Lack of support of Arrow types that Datafusion needs, e.g, `Decimal128`. I'd think we could replace some string functions, that are not supported by `arrow-string` by `arrow-udf` to get rid of macros like `compute_utf8_op`. An example would be ```rust // declare concat #[function("concat(string, string) -> string")] #[function("concat(largestring, largestring) -> largestring")] fn concat(lhs: &str, rhs: &str) -> String { format!("{}{}", lhs, rhs) } // reference concat apply_udf( &ColumnarValue::Array(left), &ColumnarValue::Array(right), &Field::new("", DataType::Utf8, true), "concat", ) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org