etseidl commented on code in PR #9967:
URL: https://github.com/apache/arrow-rs/pull/9967#discussion_r3236839264
##########
parquet/src/arrow/arrow_writer/levels.rs:
##########
@@ -645,17 +645,58 @@ impl LevelInfoBuilder {
match &info.logical_nulls {
Some(nulls) => {
assert!(range.end <= nulls.len());
- let nulls = nulls.inner();
- info.def_levels.extend_from_iter(range.clone().map(|i| {
- // Safety: range.end was asserted to be in bounds
earlier
- let valid = unsafe { nulls.value_unchecked(i) };
- max_def_level - (!valid as i16)
- }));
- info.non_null_indices.reserve(len);
- info.non_null_indices.extend(
- BitIndexIterator::new(nulls.inner(), nulls.offset() +
range.start, len)
- .map(|i| i + range.start),
- );
+ // Bulk-fill fast path. Gated on:
+ // - `len >= 64`: per-call slice/popcount/iterator
overhead only
+ // amortizes on sizable sub-ranges. List/struct paths
call
+ // write_leaf many times with tiny ranges (avg list
length 1-5);
+ // paying any per-call popcount there would regress
them. A
+ // threshold sweep at T={0,16,32,64,128,256} on Ryzen
9 9950X
+ // shows the regression floor settles by T=32 and the
choice of
+ // 64 gives ~12x margin over avg list length without
losing the
+ // flat-primitive wins.
+ // - `nulls.null_count() * 2 >= nulls.len()`: cached
`null_count()`
+ // is O(1), so this check is free. We use the
buffer-level density
+ // as a heuristic for the sub-range; for full-array
writes (the
+ // primary target — flat primitive columns) it's exact.
+ // Note: even when this gate skips the fast path,
evaluating the gate
+ // itself across high-frequency call sites (~10K calls in
some list
+ // benchmarks) is a small structural cost (~+1-2% on
list-sparse
+ // cases). It's the price of having any gate at all on
this hot path;
+ // reducing it further would require hoisting the decision
into the
+ // caller. The wins on the targeted shapes (-35%
sparse-primitive,
+ // -66% all-null primitive) far outweigh it.
Review Comment:
Please refer to
https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md#ai-generated-submissions
I think this exposition is better in the PR than the code.
##########
parquet/src/arrow/arrow_writer/levels.rs:
##########
@@ -645,17 +645,58 @@ impl LevelInfoBuilder {
match &info.logical_nulls {
Some(nulls) => {
assert!(range.end <= nulls.len());
- let nulls = nulls.inner();
- info.def_levels.extend_from_iter(range.clone().map(|i| {
- // Safety: range.end was asserted to be in bounds
earlier
- let valid = unsafe { nulls.value_unchecked(i) };
- max_def_level - (!valid as i16)
- }));
- info.non_null_indices.reserve(len);
- info.non_null_indices.extend(
- BitIndexIterator::new(nulls.inner(), nulls.offset() +
range.start, len)
- .map(|i| i + range.start),
- );
+ // Bulk-fill fast path. Gated on:
+ // - `len >= 64`: per-call slice/popcount/iterator
overhead only
+ // amortizes on sizable sub-ranges. List/struct paths
call
+ // write_leaf many times with tiny ranges (avg list
length 1-5);
+ // paying any per-call popcount there would regress
them. A
+ // threshold sweep at T={0,16,32,64,128,256} on Ryzen
9 9950X
+ // shows the regression floor settles by T=32 and the
choice of
+ // 64 gives ~12x margin over avg list length without
losing the
+ // flat-primitive wins.
+ // - `nulls.null_count() * 2 >= nulls.len()`: cached
`null_count()`
+ // is O(1), so this check is free. We use the
buffer-level density
+ // as a heuristic for the sub-range; for full-array
writes (the
+ // primary target — flat primitive columns) it's exact.
+ // Note: even when this gate skips the fast path,
evaluating the gate
+ // itself across high-frequency call sites (~10K calls in
some list
+ // benchmarks) is a small structural cost (~+1-2% on
list-sparse
+ // cases). It's the price of having any gate at all on
this hot path;
+ // reducing it further would require hoisting the decision
into the
+ // caller. The wins on the targeted shapes (-35%
sparse-primitive,
+ // -66% all-null primitive) far outweigh it.
+ if len >= 64 && nulls.null_count() * 2 >= nulls.len() {
Review Comment:
Please replace `64` with a constant. Documentation for the constant can
point to this PR for an explanation as to how 64 was arrived at.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]