Mike Seddon created ARROW-11434:
-----------------------------------

             Summary: Length kernel returns bytes not character length
                 Key: ARROW-11434
                 URL: https://issues.apache.org/jira/browse/ARROW-11434
             Project: Apache Arrow
          Issue Type: Bug
          Components: Rust, Rust - DataFusion
            Reporter: Mike Seddon
            Assignee: Mike Seddon


The rust `length` kernel currently counts number of bytes/octets rather than 
characters given that Arrow uses UTF8 encoding.

This means that the result of the `length` kernel on a string like `josé` will 
be 5 bytes rather than 4 characters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to