berkaysynnada commented on PR #9815:
URL: 
https://github.com/apache/arrow-datafusion/pull/9815#issuecomment-2036369642

   > Thanks @Lordworms -- I took a quick look of this PR
   > 
   > I am probably missing something obvious but I don't understand the need 
for the pushdown pass in the physical optimizer.
   > 
   > If the usecase is to get a limit closer to `StreamingTableExec` then maybe 
we can pushing the `fetch` to the `CoalesceBatchesExec` rather than the 
`StreamingTableExec` ?
   > 
   > It seems to me that a limit in the StreamingTable exec can likely be 
implemented more efficiently, _and_ would already be handled by the existing 
Limit pushdown in the LogicalPlan.
   > 
   > Maybe @berkaysynnada or @mustafasrepo have some more context
   
   Thanks @alamb for the feedbacks. @Lordworms's strategy is actually intuitive 
and reasonable, but maybe we need another way to solve the problem. 
   
   If I summarize https://github.com/apache/arrow-datafusion/issues/9792, the 
problem is when a `Limit` exists above `CoalesceBatches`, `CoalesceBatches` 
waits until all rows are collected which are possibly not used after `Limit`. 
Therefore; we need `CoalesceBatches` to sense the fetch count of the `Limit`, 
and after that many rows are collected, it should be able to return them 
without waiting more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to