mbutrovich opened a new pull request, #3140:
URL: https://github.com/apache/datafusion-comet/pull/3140
## Which issue does this PR close?
Closes #.
## Rationale for this change
PR #3077 added support for hashing complex types (arrays, structs, maps) but
used a generic recursive approach that could add overhead:
1. **Array slicing overhead**: For each element in a List/Map, the code
called `slice(idx, 1)` which creates a new `ArrayRef` - this involves heap
allocation and reference counting overhead
2. **Recursive dispatch overhead**: Each element was hashed via a recursive
call through the full type dispatch system (`match col.data_type()`)
3. **Iterator chain overhead**: Primitive type hashing used
`.iter_mut().zip().enumerate()` which creates iterator adapter overhead
## What changes are included in this PR?
1. Optimized primitive type hashing (all numeric types, boolean, decimal)
Changed from iterator chains to direct array indexing:
```rust
// Before: iterator overhead
for (hash, value) in $hashes.iter_mut().zip(values.iter()) {
*hash = $hash_method(value, *hash);
}
// After: direct indexing
for i in 0..values.len() {
$hashes[i] = $hash_method(values[i], $hashes[i]);
}
```
Applied to all 6 hash macros: `hash_array`, `hash_array_boolean`,
`hash_array_primitive`, `hash_array_primitive_float,`
`hash_array_small_decimal`, h`ash_array_decimal`
2. Specialized complex type hashing for primitive elements
Added new `hash_list_primitive!` macro with direct buffer access,
eliminating per-element array slicing and recursive calls:
Optimized types:
- List<primitive>: Int8/16/32/64, Float32/64, Boolean, Utf8, Binary, Date32,
Timestamp
- LargeList<primitive>: Same types as List
- FixedSizeList<primitive>: Int32, Int64, Float64
- Map<primitive, primitive>: Common combinations (String→Int32,
Int32→String, String→String, Int32→Int32)
Fallback: Complex nested types still use the original recursive approach,
which is appropriate for these less common cases.
## How are these changes tested?
Existing tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]