tustvold opened a new issue, #1815:
URL: https://github.com/apache/arrow-rs/issues/1815

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   `ArrayData::validate_utf8` calls `std::str::from_utf8` for each slice in a 
StringArray. This is correct, but potentially suboptimal.
   
   **Describe the solution you'd like**
   
   A trick we use in other places is to make this faster is to:
   
   * Validate that the entire values buffer itself is valid utf8
   * std::str::is_char_boundary is true for each offset
   
   This is typically orders of magnitude faster
   
   **Describe alternatives you've considered**
   
   We could not do this
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to