berkaysynnada commented on PR #9815: URL: https://github.com/apache/arrow-datafusion/pull/9815#issuecomment-2036369642
> Thanks @Lordworms -- I took a quick look of this PR > > I am probably missing something obvious but I don't understand the need for the pushdown pass in the physical optimizer. > > If the usecase is to get a limit closer to `StreamingTableExec` then maybe we can pushing the `fetch` to the `CoalesceBatchesExec` rather than the `StreamingTableExec` ? > > It seems to me that a limit in the StreamingTable exec can likely be implemented more efficiently, _and_ would already be handled by the existing Limit pushdown in the LogicalPlan. > > Maybe @berkaysynnada or @mustafasrepo have some more context Thanks @alamb for the feedbacks. @Lordworms's strategy is actually intuitive and reasonable, but maybe we need another way to solve the problem. If I summarize https://github.com/apache/arrow-datafusion/issues/9792, the problem is when a `Limit` exists above `CoalesceBatches`, `CoalesceBatches` waits until all rows are collected which are possibly not used after `Limit`. Therefore; we need `CoalesceBatches` to sense the fetch count of the `Limit`, and after that many rows are collected, it should be able to return them without waiting more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
