alamb commented on PR #6154:
URL: https://github.com/apache/arrow-rs/pull/6154#issuecomment-2256644528

   Thanks @chloro-pn. 🙏   We have gone back and forth on this idea while 
integrating StringView into datafusion
   
   The StringViewArray has a `gc` method but this does require an extra copy of 
the views
   
   
https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html#method.gc
   
   In fact @XiangpengHao  used this API in 
https://github.com/apache/datafusion/pull/11587 to solve exactly the problem 
you are describing (too much unused data in the buffers)
   
   However, what I worry about is that the heuristic to determine when to 
compact the string data / buffers will not be ideal for any particular usecase 
and that one princple of this crate is to give the user maximal control over 
performance
   
   So I would like to propose we support two different modes for filter 
kernels: 
   1. Filter only views
   2. Filter the views and copy matchings strings to a new buffer
   
   @XiangpengHao I wonder if you have any thoughts to add here?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to