This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/main by this push:
     new 2f5ae5c85d Skip redundant validation checks in RecordBatch#project 
(#8583)
2f5ae5c85d is described below

commit 2f5ae5c85d41baa55d81bab0a54066ac12be1c12
Author: Pepijn Van Eeckhoudt <[email protected]>
AuthorDate: Tue Oct 14 17:52:59 2025 +0200

    Skip redundant validation checks in RecordBatch#project (#8583)
    
    # Which issue does this PR close?
    
    - Closes #8591.
    
    # Rationale for this change
    
    RecordBatch project currently uses the validating factory function.
    Since project starts from a valid RecordBatch these checks are
    redundant. A small amount of work can be saved by using `new_unchecked`
    instead.
    
    A change I'm working on for DataFusion uses `RecordBatch#project` in the
    inner expression evaluation loop to reduce the amount of redundant array
    filtering `case` expressions need to do. While a micro optimisation,
    avoiding redundant work in inner loops seems worthwhile.
    
    # What changes are included in this PR?
    
    - Use `new_unchecked` instead of `try_new_with_options` in
    `RecordBatch#project`
    
    # Are these changes tested?
    
    No additional tests added.
    Performance difference proven via microbenchmark
    
    # Are there any user-facing changes?
    
    No
    
    Co-authored-by: Andrew Lamb <[email protected]>
---
 arrow-array/src/record_batch.rs | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arrow-array/src/record_batch.rs b/arrow-array/src/record_batch.rs
index d1c24b47a8..cfec969165 100644
--- a/arrow-array/src/record_batch.rs
+++ b/arrow-array/src/record_batch.rs
@@ -445,14 +445,16 @@ impl RecordBatch {
             })
             .collect::<Result<Vec<_>, _>>()?;
 
-        RecordBatch::try_new_with_options(
-            SchemaRef::new(projected_schema),
-            batch_fields,
-            &RecordBatchOptions {
-                match_field_names: true,
-                row_count: Some(self.row_count),
-            },
-        )
+        unsafe {
+            // Since we're starting from a valid RecordBatch and project
+            // creates a strict subset of the original, there's no need to
+            // redo the validation checks in `try_new_with_options`.
+            Ok(RecordBatch::new_unchecked(
+                SchemaRef::new(projected_schema),
+                batch_fields,
+                self.row_count,
+            ))
+        }
     }
 
     /// Normalize a semi-structured [`RecordBatch`] into a flat table.

Reply via email to