a10y opened a new issue, #6167:
URL: https://github.com/apache/arrow-rs/issues/6167

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Related to #6163 
   
   The `take` kernel for StringView and BinaryView is implemented using 
`GenericByteViewArray::new()` which is a safe constructor that does full utf8 
validation for all non-inlined strings in the buffers. This is kind of silly, 
given we're not even constructing a new array, just copying the existing 
buffers arrays that are known to contain well-formed utf8 values.
   
   In Vortex, I'm seeing this show up in the profiles for TPC-H queries as one 
of the more prominent items, in many cases causing a ~50% regression versus the 
prior version.
   
   
![image](https://github.com/user-attachments/assets/9fd0ac0a-2e92-4bf1-a9bb-51be4b5b8844)
   
   **Describe the solution you'd like**
   
   The `take_byte` kernel for Utf8/Binary arrays constructs an ArrayData 
instance and does not perform Utf8 validation, since we're taking from an 
already known-good Utf8 array.
   
   **Describe alternatives you've considered**
   
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to