Re: [PR] GC `StringViewArray` in `CoalesceBatchesStream` [datafusion]

via GitHub Thu, 25 Jul 2024 08:32:19 -0700


XiangpengHao commented on PR #11587:
URL: https://github.com/apache/datafusion/pull/11587#issuecomment-2250691291


   While working on the [blog post](#11603), I came up with a better herustic 
that leads future performance improvement (e.g., additional 5%).
   
   The idea is to calcualte an `ideal_buffer_size`, and if the actual buffer 
size is twice as larger, then we do gc.
   We also use the `ideal_buffer_size` to set optimal block_size value, so that 
we never waste a single byte.
   
   Calculating the `ideal_buffer_size` needs to traverse the views, it is 
actually cheap as the batches are pretty small for low cardinality filters, 
which is most cases.
   
   cc @alamb @2010YOUY01  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] GC `StringViewArray` in `CoalesceBatchesStream` [datafusion]

Reply via email to