ctsk commented on PR #16463: URL: https://github.com/apache/datafusion/pull/16463#issuecomment-2994386331
@Dandandan I believe that that heuristic does not make sense in this context. The reason why the gc is introduced here is mainly to reduce the size of the data buffer vector of StringView/ByteView arrays, not to save memory. Sadly, the condition wouldn't even trigger consistently if I used the same threshold, because the batches come from a CoalesceBatchesExec which already applied the same logic (before concat - but the ratio of data buffer size to referenced size would remain the same..) A static threshold for the number of data buffer sizes could make sense, but it seems fiddly to me. I've outlined in the associated issue, why I prefer fixing the issue in arrow-rs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org