Omega359 opened a new issue, #13552: URL: https://github.com/apache/datafusion/issues/13552
### Is your feature request related to a problem or challenge? One of the things I've been thinking about when working on utf8view support in udfs is what exactly datafusion should support in terms of function signature types. Currently we haven't formalized what we expect functions to support and thus string functions are not consistent in terms of what they accept and what they generate. @alamb also asked whether the level of specialization of a function was indeed required in https://github.com/apache/datafusion/pull/13403#issuecomment-2491701015 and if a proposal to have guidelines for string functions should be made. This is my attempt at such a proposal. ### Describe the solution you'd like In the context of this proposal string functions are UDF's that accept and produce strings. This does exclusively mean udf's in `functions/string` and `functions/unicode` I would like to propose the following for DataFusion: 1. String functions **MUST** accept both scalar and array values for all data arguments (vs config such as regex's 'flags' arguments). 2. String functions **MUST** accept scalar values for all config arguments but *MAY* accept both scalar and array if appropriate for the function. 3. String functions **MUST** accept all valid string types for all data arguments. To ease implementation the type for all data arguments **SHOULD** be coerced to be the largest type among all the data arguments. 4. String functions **MAY** choose to allow non-contiguous data types for data arguments but it is **NOT RECOMMENDED** for functions with 3 or more arguments. 5. String functions **MAY** choose to output in Utf8View instead of Utf8 if DataFusion is configured with `schema_force_view_types` == `true`. Otherwise string functions **SHOULD** output string results in the same type as the received primary data argument. 6. String functions **SHOULD** rely on type coercion to handle non-string data. For example, concat('ab', 2, 'cc'). 7. String functions **MUST** handle non-control unicode textual character classes unless the function explicitly is designed for a particular character set (ascii for example) 8. String functions **SHOULD NOT** attempt to specially handle unicode grapheme characters unless it's directly related to the function requirements. ### Describe alternatives you've considered _No response_ ### Additional context I am unsure about whether all string functions should be required to handle dictionary types or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org