Re: [PR] feat: support push batch direct to completed and add biggest coalesce batch support [arrow-rs]

via GitHub Sun, 17 Aug 2025 06:49:00 -0700


zhuqi-lucas commented on PR #8146:
URL: https://github.com/apache/arrow-rs/pull/8146#issuecomment-3194399065


   Optimized behavior (biggest_coalesce_batch_size = Some(limit)) — three cases
   
   - Case 1 — Empty buffer + large incoming batch (Direct bypass)
   
   Condition: incoming.size > limit and buffered_rows == 0.
   
   Action: Bypass coalescing; output the incoming batch unchanged.
   
   Example: limit=500, incoming 600 → output [600].
   
   - Case 2 — Buffer already large + large incoming batch (Flush then bypass)
   
   Condition: incoming.size > limit and buffered_rows > limit.
   
   Action: First flush the buffered rows as one output, then bypass and output 
the incoming batch unchanged.
   
   Purpose: Prevent creating extremely large merged batches that exceed 
expectations.
   
   Example: limit=400, buffer 350+200=550, incoming 800 → outputs [550], [800].
   
   - Case 3 — Small buffer + large incoming batch (Normal coalesce/split)
   
   Condition: incoming.size > limit and buffered_rows <= limit.
   
   Action: Follow normal merging and splitting rules (merge buffer + incoming, 
then split by target_batch_size).
   
   Example: limit=500, buffer 300, incoming 1200 → merge to 1500, split to 
[1000] and buffer [500].


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: support push batch direct to completed and add biggest coalesce batch support [arrow-rs]

Reply via email to