alamb opened a new issue, #9100: URL: https://github.com/apache/arrow-datafusion/issues/9100
### Is your feature request related to a problem or challenge? As pointed out by @viirya on https://github.com/apache/arrow-datafusion/pull/8705#discussion_r1470110480 there are many potential ways to organize the data fusion function library as we break it out of the core in https://github.com/apache/arrow-datafusion/issues/8045 I would like to get some consensus on how we want the organization to look before creating tickets and starting to crank it out ### Describe the solution you'd like Here is a proposal for how the functions are organized. ## math functions * feature_flag (new): `math_expressions` * code location: `datafusion/functions/src/math` * Abs, Acos, Asin, Atan, Atan2, Acosh, Asinh, Atanh, Cbrt, Ceil, Cos, Cosh, Degrees, Exp, Factorial, Floor, Gcd, Lcm, Ln, Log, Log10, Log2, Pi, Power, Radians, Signum, Sin, Sinh, Sqrt, Tan, Tanh, Trunc, Cot, Round, iszero ## array functions **Given the size and specialization of these functions I propose putting them into their own subcrate** * feature_flag (new): `array_expressions` * code location: `datafusion/functions-array/src/math` * ArrayAppend, ArraySort, ArrayConcat, ArrayHas, ArrayHasAll, ArrayHasAny, ArrayPopFront, ArrayPopBack, ArrayDims, ArrayDistinct, ArrayElement, ArrayEmpty, ArrayLength, ArrayNdims, ArrayPosition, ArrayPositions, ArrayPrepend, ArrayRemove, ArrayRemoveN, ArrayRemoveAll, ArrayRepeat, ArrayReplace, ArrayReplaceN, ArrayReplaceAll, ArraySlice, ArrayToString, ArrayIntersect, ArrayUnion, ArrayExcept, Cardinality, ArrayResize, Flatten, Range, StringToArray, ## Core functions These functions are always available as they are used for internal purposes (like implementing `[1,2,3]` syntax in SQL or so commonly used that it is not worth having a feature flag) * feature_flag: NONE * code location: `datafusion/functions/src/core` or similar * `MakeArray`: construct an array from columns * `Isnan`: is the value NaN * `Coalesce`: return the first non-null value * `Nanvl`: return the first non-NaN value * `Struct`: Create a struct * `NullIf`: return null if the two values are equal * `Random`: return a random number * `ArrowTypeOf`: return the arrow type of a value ## Crypto functions * feature_flag (existing): `crypto_expressions` * code location: `datafusion/functions/src/crypto` * Digest, MD5, SHA224, SHA256, SHA384, SHA512 ## String functions * feature_flag (new): `string_expressions` * code location: `datafusion/functions/src/string` * ascii, bit_length, btrim, chr, concat, concat_ws, ends_with, initcap, instr, lower, ltrim, octet_length, repeat, replace, rtrim, split_part, starts_with, to_hex, trim, upper, levenshtein, uuid, overlay ## Unicode string functions These expressions need an additioanl dependency, which is why they have a different flag) * feature_flag (existing): `unicode_expressions` * code location: `datafusion/functions/src/string/unicode` * CharLength, Left, Lpad, Reverse, Right, Rpad, Strpos, Substr, Translate, SubstrIndex, FindInSet ## regex functions * feature_flag (existing): `regex_expressions` * code location: `datafusion/functions/src/regexp` * RegexpMatch, RegexpReplace ## date time function * feature_flag (new): `datetime_expressions` * code location: `datafusion/functions/src/datetime` * date_part, date_trunc, date_bin, to_timestamp, to_timestamp_millis, to_timestamp_micros, to_timestamp_nanos, to_timestamp_seconds, from_unixtime, now, current_date, current_time ### Describe alternatives you've considered We can have more fine grained crates, or different organizations, etc For example, perhaps we should pull the string functions into `datafusion/functions-string` crate ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
