zhuqi-lucas opened a new pull request, #8112:
URL: https://github.com/apache/arrow-rs/pull/8112

   # Which issue does this PR close?
   
   - Related to https://github.com/apache/datafusion/pull/16249
   
   - Related to https://github.com/apache/arrow-rs/issues/6692
   - Related to https://github.com/apache/datafusion/issues/3463
   
   
   # Rationale for this change
   
   We want to keep consistent with original behaviour for BatchCoalescer in 
datafusion, so we introduce new config to support 
   no exact size config, which means :
   
   ```rust
   /// # Notes:
   ///
   /// 1. Output rows are produced in the same order as the input rows
   ///
   /// 2. The output is a sequence of batches, with all but the last being at 
exactly
   ///    `target_batch_size` rows.
   ///
   /// Notes on `exact_size`:
   ///
   /// - `exact_size == true` (strict): output batches are produced so that all 
but
   ///   the final batch have exactly `target_batch_size` rows (default 
behavior).
   /// - `exact_size == false` (non-strict, default for this crate): output 
batches
   ///   will be produced when the buffered rows are >= `target_batch_size`. The
   ///   produced batch may be larger than `target_batch_size` (i.e., size >= 
target).
   ```
   
   
   # What changes are included in this PR?
   
   1. Support exact size config for BatchCoalescer 
   2. Also change the benchmark to non exact size, so we can check the 
performance improvement.
   
   # Are these changes tested?
   
   Yes, new tests added.
   
   # Are there any user-facing changes?
   
   No, the default behaviour is still exact size = true
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to