alamb opened a new issue, #9285: URL: https://github.com/apache/arrow-datafusion/issues/9285
### Is your feature request related to a problem or challenge? As part of making DataFusion even more customizable (https://github.com/apache/arrow-datafusion/issues/8045), it is valuable to let system designers mix and match different packages of functions to get the precise behavior they want (e.g. postgres style `to_date` or spark style `to_date`). To support this functionality as well as to ensure the `ScalarUDF` API exposes the full power of DataFusion, we are in the process of extracting the "built in" functions out of the core and into separate crates. This epic tracks the work to actually move the functions out of the core datafusion crate (spread through `datafusion_expr` and `datafusion-physical-expr` and into the new `datafusion-functions` / `datafusion-functions-array` crates ### Describe the solution you'd like ## Tasks: Here is list of many of the items necessary to complete this transition. Eventually there should be tickets for all tasks, and there are tickets for some already, but I don't want to make 100s of tickets all at once. I plan to make more as we make it through more of this project. Anyone should feel free to make other tickets if they want to help with items below. # `math_expressions` These should be located in the `datafusion-functions` crate ([source link[(https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions)) Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/math/mod.rs - [x] https://github.com/apache/arrow-datafusion/pull/9216 - [ ] Port `abs` to datafusion_functions - [ ] Abs, Acos, Asin, - [ ] Atan, Atan2, Acosh, Asinh, Atanh, - [ ] Cbrt, Ceil, Cos, Cosh, Degrees, Exp, Factorial, - [ ] Floor, Gcd, Lcm, Ln, Log, Log10, Log2, Pi, Power, - [ ] Radians, Signum, Sin, Sinh, Sqrt, - [ ] Tan, Tanh, Trunc, Cot, Round, iszero ## `array_expressions` Note that given the size and specialization of these functions are put in their own subcrate [`datafusion-functions-array`](https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions-array) - [ ] ArrayToString (TODO find PR) - [ ] Move the `make_array` function into the `datafusion-array-expressions` crate -- TODO file and link ticket - [ ] ArrayAppend, ArraySort, ArrayConcat, ArrayHas, ArrayHasAll, ArrayHasAny, - [ ] ArrayPopFront, ArrayPopBack, ArrayDims, ArrayDistinct, ArrayElement, - [ ] ArrayEmpty, ArrayLength, ArrayNdims, ArrayPosition, ArrayPositions, - [ ] ArrayPrepend, ArrayRemove, ArrayRemoveN, ArrayRemoveAll, ArrayRepeat, - [ ] ArrayReplace, ArrayReplaceN, ArrayReplaceAll, ArraySlice, - [ ] ArrayIntersect, ArrayUnion, ArrayExcept, - [ ] Cardinality, ArrayResize, Flatten, Range, StringToArray, - [ ] `MakeArray`: construct an array from columns (union/except depends on this) ## Core functions These should be located in the `datafusion-functions` crate ([source link[(https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions)) Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/core/mod.rs - [x] Create `core` module, extract `nullif`: https://github.com/apache/arrow-datafusion/pull/9216 - [ ] Move `ArrowCast` to datafusion-functions - [ ] Move `ArrowTypeOf`: return the arrow type of a value - [ ] `Coalesce`: return the first non-null value - [ ] `Struct`: Create a struct - [ ] `NullIf`: return null if the two values are equal - [ ] `Random`: return a random number - [ ] `Nanvl`: return the first non-NaN value ## `crypto_expressions` These should be located in the `datafusion-functions` crate ([source link[(https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions)) Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/crypto/mod.rs - [ ] Create `crypto` module in `datafusion/functions/src/crypto` and `crypto_expressions` feature flag, move `digest` function - [ ] Digest, MD5, SHA224, SHA256, SHA384, SHA512 ## `string_expressions` These should be located in the `datafusion-functions` crate ([source link[(https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions)) Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/string/mod.rs - [ ] Create `string` module in `datafusion/functions/src/string` and `string_expressions` feature flag, move `ascii` function - [ ] ascii, bit_length, btrim, chr, - [ ] concat, concat_ws, ends_with, initcap, - [ ] instr, lower, ltrim, octet_length, - [ ] repeat, replace, rtrim, split_part, - [ ] starts_with, to_hex, trim, upper, - [ ] levenshtein, uuid, overlay ## `unicode_expressions` These should be located in the `datafusion-functions` crate ([source link[(https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions)) Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/unicode/mod.rs - [ ] Create `unicode` module in `datafusion/functions/src/unicode` and `unicode_expressions` feature flag, move `charlength` function - [ ] CharLength, - [ ] Left, Lpad, Reverse, Right, Rpad, - [ ] Strpos, Substr, - [ ] Translate, SubstrIndex, FindInSet ## `regex_expressions` These should be located in the `datafusion-functions` crate ([source link[(https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions)) Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/regexp/mod.rs - [ ] Create `regex` module in `datafusion/functions/src/regex` and `regex_expressions` feature flag, move regexp_match - [ ] RegexpMatch, RegexpReplace - [ ] RegexpLike ## `datetime_expressions` These should be located in the `datafusion-functions` crate ([source link[(https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions)) Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/datetime/mod.rs - [ ] Create `datetime` module in `datafusion/functions/src/datetime` and `datetime_expressions` feature flag, move `date_part` - [ ] port benchmarks to datafusion-functions crate - [ ] date_part, date_trunc, date_bin, - [ ] to_timestamp, to_timestamp_millis, to_timestamp_micros, to_timestamp_nanos, to_timestamp_seconds, - [ ] from_unixtime, now, current_date, current_time ### Describe alternatives you've considered _No response_ ### Additional context The organization was discussed in https://github.com/apache/arrow-datafusion/issues/9100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org