Re: [PR] feat: optimize CoalesceBatches in limit [datafusion]

via GitHub Thu, 15 Aug 2024 03:24:23 -0700


berkaysynnada commented on PR #11983:
URL: https://github.com/apache/datafusion/pull/11983#issuecomment-2291031872


   The issue explained in https://github.com/apache/datafusion/issues/9792 was 
resolved with the implementation of 
https://github.com/apache/datafusion/pull/11652. This fix handles the problem 
related to waiting for the coalescer buffer to fill when a `Limit -> ... -> 
CoalesceBatches` pattern exists. The approach was to push down the limit (fetch 
+ skip) into `CoalesceBatches` and eliminate the limit when it was no longer 
needed.
   
   With https://github.com/apache/datafusion/pull/12003, it appears that 
additional corner cases are being addressed. It further refines the process by 
pushing limits as far down the execution plan as possible and removing any 
redundant limits.
   
   It seems that these recent improvements already address the objective you're 
aiming for, without the need to define a constant thresholds. I think there is 
no difference between using a limit without coalescing and using a coalesce 
that can internally handle limits.
   
   I am curious about your thoughts. Do you still see a need for additional 
optimization? If so, could you provide an example scenario or a test case that 
would help us discuss this further?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: optimize CoalesceBatches in limit [datafusion]

Reply via email to