andygrove opened a new issue, #4626:
URL: https://github.com/apache/datafusion-comet/issues/4626
## What
Pick up the work started in #3659 (which I am closing). The goal is to speed
up the nullable bulk-append paths in `native/shuffle/src/spark_unsafe/list.rs`
(`impl_append_to_builder` macro, `append_booleans`, `append_timestamps`,
`append_dates`) by using slice-based access instead of per-element
`read_unaligned()` + manual pointer arithmetic.
## Where
`native/shuffle/src/spark_unsafe/list.rs`
Microbenchmark: `native/core/benches/array_element_append.rs` exercises i32,
i64, f64, date32, and timestamp with and without nulls at 10k elements.
## Background
#3659 had a draft that converted nullable paths to slice access guarded by a
runtime alignment check. Review feedback from @mbutrovich on that PR (worth
reading the inline comments before starting):
1. **Alignment is NOT guaranteed** — the PR body claimed otherwise, but
`unsafe_object.rs:49` and the existing comment at `list.rs:105` document that
the array base can be unaligned when nested in a row's variable-length region.
The runtime alignment check is load-bearing for soundness;
`std::slice::from_raw_parts::<T>` on a misaligned pointer is UB. Keep the
check, and make the PR description accurately describe the unaligned fallback.
2. **The bigger win is removing the per-element null branch** — in the
nullable path, the loop still branches on `is_null_in_bitset` per element.
Reviewer suggested either:
- `PrimitiveBuilder::append_values(values: &[T::Native], is_valid:
&[bool])` after materializing a `&[bool]` from the null bitset
- Build a `BooleanBuffer` from the Spark null bitmap and feed validity to
the builder directly (skips the bool materialization)
For booleans (alignment 1, no alignment fallback needed),
`BooleanBuilder` may be able to absorb the validity in bulk too.
3. **Post benchmark numbers from `array_element_append.rs`** — the
non-nullable aligned branch is structurally unchanged, so the interesting
comparison is the nullable rows, especially if the bulk-validity change is
included.
## Suggested approach
- Land the loop reshape + bulk validity together (rather than splitting), so
the benchmark numbers tell a complete story.
- Update the PR description to reflect that the unaligned fallback exists
and explain *why* (nested-array case) so a future reader does not delete the
runtime check.
## References
- Closed PR: #3659
- Inline review comments:
https://github.com/apache/datafusion-comet/pull/3659#pullrequestreview-4320281277
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]