alamb commented on PR #9815: URL: https://github.com/apache/arrow-datafusion/pull/9815#issuecomment-2037323653
> If I summarize https://github.com/apache/arrow-datafusion/issues/9792, the problem is when a Limit exists above CoalesceBatches, CoalesceBatches waits until all rows are collected which are possibly not used after Limit. Therefore; we need CoalesceBatches to sense the fetch count of the Limit, and after that many rows are collected, it should be able to return them without waiting more. Right -- my point was that `CoalesceBatches` seems like somewhat of a workaround for a limit in `StreamingTableExec` -- it seems like if we handled the limit in `StreamingTableExec` then 1. It could be more efficient as the `StreamingTableExec` could stop as soon as the limit was hit 2. We would not need any changes to `CoalesceBatches` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
