alamb opened a new issue, #6408:
URL: https://github.com/apache/arrow-rs/issues/6408

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   While working on an upstream project 
https://github.com/apache/datafusion/pull/12092 which switched DataFusion to 
use `StringViewArray` rather than `StringArray`
   
   When I did, one of the queries got much slower. 
   
   Profiling revealed that the time difference was almost entirely explained by 
the time spent in `StringViewArray::slice()` 
   
   ![Screenshot 2024-09-16 at 4 44 50 
PM](https://github.com/user-attachments/assets/efe90a30-1188-4af4-822c-b584b177770a)
   
   Here is the flamegraph with `StringArray`:
   
![flamegraph-main](https://github.com/user-attachments/assets/00216a16-7688-4cbb-bc96-372841bb157c)
   
   Here is the same query with `StringViewArray`:
   
![flamegraph-string-view](https://github.com/user-attachments/assets/1a314737-a0ba-4f00-8a6a-0a62626a01ce)
   
   I am pretty sure the additional time is due to the time spent allocating / 
copying / deallocating the `Vec`s of buffers here:
   
   
https://github.com/apache/arrow-rs/blob/341ec357e74e897f50250930b44f453bce54a19a/arrow-array/src/array/byte_view_array.rs#L118
   
   Where calling `slice` on a StringArray can be done with a few Arc 
increments. 
   
   
https://github.com/apache/arrow-rs/blob/3490639252294215c7ee05990d82b43e2cd097a6/arrow-array/src/array/byte_array.rs#L88-L91
   
   **Describe the solution you'd like**
   I would like `StringViewArray::slice` to be faster (aka don't allocate)
   
   
   **Describe alternatives you've considered**
   
   
   We can (and probably should) change DataFusion not to use slice in this case 
(I am working to file a ticket on this) but I think making `slice` faster / non 
allocating for `StringViewArray` will be useful in general
   
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to