2010YOUY01 commented on PR #17105: URL: https://github.com/apache/datafusion/pull/17105#issuecomment-3182704380
For the `tpch_mem` slowdown, another possible reason could be unnecessary copies for batches that are exactly `batch_size`. For certain operators, there might already be an internal mechanism to ensure their output is exactly batch_size. From a quick look at the implementation, the old version could pass such batches through directly, whereas this PR forces them to be copied. Another potential improvement: could we make this pass-through threshold more lenient? For example, if the coalescer receives a batch with size >= `batch_size / 2`, it could pass it through without coalescing. In such cases, the output size is already large enough to benefit from vectorization, so the extra concatenation might not add much value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org