seddonm1 opened a new pull request #9366: URL: https://github.com/apache/arrow/pull/9366
This PR renames the `length` kernel to `octet_length` to clearly indicate what it returns and allows differentiation from `character_length`. The use of the term `octet` could be replaced with `bytes` but was chosen given there is an ANSI SQL function `octet_length`. I have created the correct `character_length` function as part of https://github.com/apache/arrow/pull/9243. **Issue** The rust `length` kernel currently counts number of `bytes`/`octets` which may or may not be the same as the number of characters given that Arrow uses UTF8 encoding. This means that the result of the `length` kernel on a string like `josé` will be 5 bytes rather than 4 characters. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
