seddonm1 opened a new pull request #9366:
URL: https://github.com/apache/arrow/pull/9366


   This PR renames the `length` kernel to `octet_length` to clearly indicate 
what it returns and allows differentiation from `character_length`. The use of 
the term `octet` could be replaced with `bytes` but was chosen given there is 
an ANSI SQL function `octet_length`.
   
   I have created the correct `character_length` function as part of 
https://github.com/apache/arrow/pull/9243.
   
   **Issue**
   The rust `length` kernel currently counts number of `bytes`/`octets` which 
may or may not be the same as the number of characters given that Arrow uses 
UTF8 encoding. This means that the result of the `length` kernel on a string 
like `josé` will be 5 bytes rather than 4 characters.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to