Re: [PR] Implement LimitPushDown for ExecutionPlan [arrow-datafusion]

via GitHub Thu, 04 Apr 2024 07:12:51 -0700


alamb commented on PR #9815:
URL: 
https://github.com/apache/arrow-datafusion/pull/9815#issuecomment-2037323653


   > If I summarize https://github.com/apache/arrow-datafusion/issues/9792, the 
problem is when a Limit exists above CoalesceBatches, CoalesceBatches waits 
until all rows are collected which are possibly not used after Limit. 
Therefore; we need CoalesceBatches to sense the fetch count of the Limit, and 
after that many rows are collected, it should be able to return them without 
waiting more.
   
   Right -- my point was that `CoalesceBatches` seems like somewhat of a 
workaround for a limit in `StreamingTableExec` -- it seems like if we handled 
the limit in `StreamingTableExec` then
   1. It could be more efficient as the  `StreamingTableExec` could stop as 
soon as the limit was hit
   2. We would not need any changes to `CoalesceBatches`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Implement LimitPushDown for ExecutionPlan [arrow-datafusion]

Reply via email to