[GitHub] [arrow-rs] alamb commented on issue #1531: Epic: enhance the `substring` kernel

GitBox Wed, 13 Apr 2022 05:43:31 -0700


alamb commented on issue #1531:
URL: https://github.com/apache/arrow-rs/issues/1531#issuecomment-1098005732


   Thank you for your thoughts @HaoYang670 
   
   I still think switching `substring` to use chars is preferable because:
   1. For ascii (single byte utf8) text, bytes and chars are equivalent
   2. If someone has  multi-byte utf8 string data the substring calculations 
are likely subtlety incorrect and they are in danger if creating invalid utf8 
when using `substring`
   1. If we make `substring` (by bytes) safe, the performance will regress 
which some people will regard as  backwards incompatible as well
   
   Perhaps other maintainers such as @nevi-me  @viirya @sunchao, @tustvold  and 
@jhorstmann  have some thoughts about if/how we should change `substring` to 
handle utf8 ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] alamb commented on issue #1531: Epic: enhance the `substring` kernel

Reply via email to