andygrove commented on PR #835:
URL: https://github.com/apache/datafusion-comet/pull/835#issuecomment-2294105508

   After addressing the first round of feedback, we now have:
   
   ```rust
   pub fn comet_filter_record_batch(
       record_batch: &RecordBatch,
       predicate: &BooleanArray,
   ) -> std::result::Result<RecordBatch, ArrowError> {
       if predicate.true_count() == record_batch.num_rows() {
           // special case where we just make an exact copy
           let arrays: Vec<ArrayRef> = record_batch
               .columns()
               .iter()
               .map(|array| {
                   let capacity = array.len();
                   let data = array.to_data();
                   let mut mutable = MutableArrayData::new(vec![&data], false, 
capacity);
                   mutable.extend(0, 0, capacity);
                   make_array(mutable.freeze())
               })
               .collect();
           let options = 
RecordBatchOptions::new().with_row_count(Some(record_batch.num_rows()));
           RecordBatch::try_new_with_options(record_batch.schema().clone(), 
arrays, &options)
       } else {
           filter_record_batch(record_batch, predicate)
       }
   }
   ```
   
   New benchmark results:
   
   ```
   filter/comet_filter - few
                           time:   [14.650 µs 14.727 µs 14.831 µs]
                           change: [-36.702% -35.128% -33.681%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high severe
   filter/comet_filter - many
                           time:   [75.962 µs 76.172 µs 76.381 µs]
                           change: [-48.681% -48.501% -48.303%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   filter/comet_filter - all
                           time:   [34.497 µs 34.628 µs 34.764 µs]
                           change: [-80.854% -80.527% -80.256%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) low mild
     1 (1.00%) high mild
     1 (1.00%) high severe
   ```
   
   This certainly looks a lot better. I am running TPC-DS again to make sure 
this really is always copying. I had tried an approach like this in the past 
but ran into data corruption issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to