acking-you opened a new issue, #11980:
URL: https://github.com/apache/datafusion/issues/11980

   ### Is your feature request related to a problem or challenge?
   
   The current 
[CoalesceBatches](https://docs.rs/datafusion/latest/src/datafusion/physical_optimizer/coalesce_batches.rs.html#38)
 optimization rule only create 
[CoalesceBatchesExec](https://docs.rs/datafusion/latest/datafusion/physical_plan/coalesce_batches/struct.CoalesceBatchesExec.html)
 based on the batch_size configured in the [config 
struct](https://docs.rs/datafusion-common/41.0.0/src/datafusion_common/config.rs.html#238),
 which can cause issues in some cases involving limit operators.
   
   Consider the following scenario:
   When a rule-compliant operation includes a `limit` operator on top of 
`CoalesceBatchesExec`, and the `limit` value is less than the `batch_size`, the 
entire computation might be blocked until a full `Batch` is collected, even 
though the `limit` has already been reached.
   
   A possible operator tree:
   ```text
   SortExec: TopK(fetch=10), expr=[event_time@3 DESC]
     LocalLimitExec: fetch=100
       CoalesceBatchesExec: target_batch_size=8192
         FilterExec: event_time@3 = 10
           TableScanExec
   ```
   Of course, we also need to consider special cases, like if the limit 
operator is above SortExec, then limit shouldn't affect the batch_size value.
   
   ### Describe the solution you'd like
   
   The `target_batch_size` is determined based on the limit operator's value 
and the current parallelism.
   
   ### Describe alternatives you've considered
   
   When operators downstream of the limit operator require a full table scan 
(e.g., SortExec), batch_size is not handled specially.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to