This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git
The following commit(s) were added to refs/heads/main by this push:
new 2f5ae5c85d Skip redundant validation checks in RecordBatch#project
(#8583)
2f5ae5c85d is described below
commit 2f5ae5c85d41baa55d81bab0a54066ac12be1c12
Author: Pepijn Van Eeckhoudt <[email protected]>
AuthorDate: Tue Oct 14 17:52:59 2025 +0200
Skip redundant validation checks in RecordBatch#project (#8583)
# Which issue does this PR close?
- Closes #8591.
# Rationale for this change
RecordBatch project currently uses the validating factory function.
Since project starts from a valid RecordBatch these checks are
redundant. A small amount of work can be saved by using `new_unchecked`
instead.
A change I'm working on for DataFusion uses `RecordBatch#project` in the
inner expression evaluation loop to reduce the amount of redundant array
filtering `case` expressions need to do. While a micro optimisation,
avoiding redundant work in inner loops seems worthwhile.
# What changes are included in this PR?
- Use `new_unchecked` instead of `try_new_with_options` in
`RecordBatch#project`
# Are these changes tested?
No additional tests added.
Performance difference proven via microbenchmark
# Are there any user-facing changes?
No
Co-authored-by: Andrew Lamb <[email protected]>
---
arrow-array/src/record_batch.rs | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/arrow-array/src/record_batch.rs b/arrow-array/src/record_batch.rs
index d1c24b47a8..cfec969165 100644
--- a/arrow-array/src/record_batch.rs
+++ b/arrow-array/src/record_batch.rs
@@ -445,14 +445,16 @@ impl RecordBatch {
})
.collect::<Result<Vec<_>, _>>()?;
- RecordBatch::try_new_with_options(
- SchemaRef::new(projected_schema),
- batch_fields,
- &RecordBatchOptions {
- match_field_names: true,
- row_count: Some(self.row_count),
- },
- )
+ unsafe {
+ // Since we're starting from a valid RecordBatch and project
+ // creates a strict subset of the original, there's no need to
+ // redo the validation checks in `try_new_with_options`.
+ Ok(RecordBatch::new_unchecked(
+ SchemaRef::new(projected_schema),
+ batch_fields,
+ self.row_count,
+ ))
+ }
}
/// Normalize a semi-structured [`RecordBatch`] into a flat table.