XiangpengHao commented on PR #11587: URL: https://github.com/apache/datafusion/pull/11587#issuecomment-2250691291
While working on the [blog post](#11603), I came up with a better herustic that leads future performance improvement (e.g., additional 5%). The idea is to calcualte an `ideal_buffer_size`, and if the actual buffer size is twice as larger, then we do gc. We also use the `ideal_buffer_size` to set optimal block_size value, so that we never waste a single byte. Calculating the `ideal_buffer_size` needs to traverse the views, it is actually cheap as the batches are pretty small for low cardinality filters, which is most cases. cc @alamb @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org