nathanielc opened a new issue, #12326:
URL: https://github.com/apache/datafusion/issues/12326

   ### Is your feature request related to a problem or challenge?
   
   I often work with a column with a binary type however its known that the 
binary data is a valid utf8 string. I'd like a mechanism to easily work with 
such data so that I can use other string scalar functions on the data.
   
   ### Describe the solution you'd like
   
   Add a scalar function to interpret a binary column as a utf8 string. This 
allows for explicit conversion between the data types.  The function could be 
named `str_from_utf8` or similar.
   
   How should non valid utf8 be handled. I see two options: 
   
   * Report an error thus causing the entire query to fail.
   * Return null for that row.
   
   I can see both being useful, does this mean we want two functions or a 
single function with a flag for its error behavior. Or is there a convention to 
follow for failable scalar functions?
   
   ### Describe alternatives you've considered
   
   We could also add an encoding format to the `encode` and `decode` functions 
as they already have the function signature of binary <-> utf8. However the 
existing formats are about encoding arbitrary bytes not about interpreting 
bytes as another format.
   
   
   
   ### Additional context
   
   I'd be happy to contribute this scalar function if we decide its a good 
solution to the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to