comphead commented on issue #14238: URL: https://github.com/apache/datafusion/issues/14238#issuecomment-2607735745
The direction proposed by @berkaysynnada is worth to discuss. The join specifics doesn't guarantee output batch size in records. It can much much smaller or even empty because of filtering, and it can be much larger because of join explosions. The idea to discuss how we can make the output batches after joins to be more uniform and close to configured `batch_size`. One of the options is to use `BatchSplitter` or `BatchCoalesce` plan nodes after the join is called. Another is to align the batches in the join internally providing the coalescer/splitter or having custom implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org