alamb commented on issue #1531: URL: https://github.com/apache/arrow-rs/issues/1531#issuecomment-1098005732
Thank you for your thoughts @HaoYang670 I still think switching `substring` to use chars is preferable because: 1. For ascii (single byte utf8) text, bytes and chars are equivalent 2. If someone has multi-byte utf8 string data the substring calculations are likely subtlety incorrect and they are in danger if creating invalid utf8 when using `substring` 1. If we make `substring` (by bytes) safe, the performance will regress which some people will regard as backwards incompatible as well Perhaps other maintainers such as @nevi-me @viirya @sunchao, @tustvold and @jhorstmann have some thoughts about if/how we should change `substring` to handle utf8 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
