nathanielc opened a new issue, #12326: URL: https://github.com/apache/datafusion/issues/12326
### Is your feature request related to a problem or challenge? I often work with a column with a binary type however its known that the binary data is a valid utf8 string. I'd like a mechanism to easily work with such data so that I can use other string scalar functions on the data. ### Describe the solution you'd like Add a scalar function to interpret a binary column as a utf8 string. This allows for explicit conversion between the data types. The function could be named `str_from_utf8` or similar. How should non valid utf8 be handled. I see two options: * Report an error thus causing the entire query to fail. * Return null for that row. I can see both being useful, does this mean we want two functions or a single function with a flag for its error behavior. Or is there a convention to follow for failable scalar functions? ### Describe alternatives you've considered We could also add an encoding format to the `encode` and `decode` functions as they already have the function signature of binary <-> utf8. However the existing formats are about encoding arbitrary bytes not about interpreting bytes as another format. ### Additional context I'd be happy to contribute this scalar function if we decide its a good solution to the problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org