alamb opened a new issue, #6780:
URL: https://github.com/apache/arrow-rs/issues/6780

   **Describe the bug**
   Quoting @onursatici from https://github.com/apache/arrow-rs/pull/6779:
   
   > Currently interleaving ByteViewArrays are done with the fallback 
implementation, which uses a MutableArrayBuilder. The extend method on this 
builder copies all variadic buffers because it doesn't know if there are 
buffers not referenced by any views in the array. 
   >
   > Especially on datafusion's TopK implementation, which uses a heap that 
interleaves arrow arrays to produce the top k rows, current interleave 
implementation results in an explosion of variadic buffer count for byte view 
arrays, adding the same set of buffers over and over again. Where this becomes 
really problematic is when sending such arrays over flight, current encoder 
materialises all variadic buffers.
   
   This also came up recently on https://github.com/apache/arrow-rs/pull/6779 
from @ShiKaiWi  and a converstaion with @tustvold @XiangpengHao and myself 
here: https://github.com/apache/arrow-rs/pull/6427#issuecomment-2493911919
   
   **To Reproduce**
   Call interleave or concat with a bunch of StringViewArrays (I think)
   
   
   **Expected behavior**
   (ideally) if an existing buffer is already in a StringViewArray's 
`variadic_buffer` list it shouldn't be added again
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to