etseidl commented on code in PR #9967:
URL: https://github.com/apache/arrow-rs/pull/9967#discussion_r3236839264


##########
parquet/src/arrow/arrow_writer/levels.rs:
##########
@@ -645,17 +645,58 @@ impl LevelInfoBuilder {
             match &info.logical_nulls {
                 Some(nulls) => {
                     assert!(range.end <= nulls.len());
-                    let nulls = nulls.inner();
-                    info.def_levels.extend_from_iter(range.clone().map(|i| {
-                        // Safety: range.end was asserted to be in bounds 
earlier
-                        let valid = unsafe { nulls.value_unchecked(i) };
-                        max_def_level - (!valid as i16)
-                    }));
-                    info.non_null_indices.reserve(len);
-                    info.non_null_indices.extend(
-                        BitIndexIterator::new(nulls.inner(), nulls.offset() + 
range.start, len)
-                            .map(|i| i + range.start),
-                    );
+                    // Bulk-fill fast path. Gated on:
+                    //   - `len >= 64`: per-call slice/popcount/iterator 
overhead only
+                    //     amortizes on sizable sub-ranges. List/struct paths 
call
+                    //     write_leaf many times with tiny ranges (avg list 
length 1-5);
+                    //     paying any per-call popcount there would regress 
them. A
+                    //     threshold sweep at T={0,16,32,64,128,256} on Ryzen 
9 9950X
+                    //     shows the regression floor settles by T=32 and the 
choice of
+                    //     64 gives ~12x margin over avg list length without 
losing the
+                    //     flat-primitive wins.
+                    //   - `nulls.null_count() * 2 >= nulls.len()`: cached 
`null_count()`
+                    //     is O(1), so this check is free. We use the 
buffer-level density
+                    //     as a heuristic for the sub-range; for full-array 
writes (the
+                    //     primary target — flat primitive columns) it's exact.
+                    // Note: even when this gate skips the fast path, 
evaluating the gate
+                    // itself across high-frequency call sites (~10K calls in 
some list
+                    // benchmarks) is a small structural cost (~+1-2% on 
list-sparse
+                    // cases). It's the price of having any gate at all on 
this hot path;
+                    // reducing it further would require hoisting the decision 
into the
+                    // caller. The wins on the targeted shapes (-35% 
sparse-primitive,
+                    // -66% all-null primitive) far outweigh it.

Review Comment:
   Please refer to 
https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md#ai-generated-submissions
 
   
   I think this exposition is better in the PR than the code.



##########
parquet/src/arrow/arrow_writer/levels.rs:
##########
@@ -645,17 +645,58 @@ impl LevelInfoBuilder {
             match &info.logical_nulls {
                 Some(nulls) => {
                     assert!(range.end <= nulls.len());
-                    let nulls = nulls.inner();
-                    info.def_levels.extend_from_iter(range.clone().map(|i| {
-                        // Safety: range.end was asserted to be in bounds 
earlier
-                        let valid = unsafe { nulls.value_unchecked(i) };
-                        max_def_level - (!valid as i16)
-                    }));
-                    info.non_null_indices.reserve(len);
-                    info.non_null_indices.extend(
-                        BitIndexIterator::new(nulls.inner(), nulls.offset() + 
range.start, len)
-                            .map(|i| i + range.start),
-                    );
+                    // Bulk-fill fast path. Gated on:
+                    //   - `len >= 64`: per-call slice/popcount/iterator 
overhead only
+                    //     amortizes on sizable sub-ranges. List/struct paths 
call
+                    //     write_leaf many times with tiny ranges (avg list 
length 1-5);
+                    //     paying any per-call popcount there would regress 
them. A
+                    //     threshold sweep at T={0,16,32,64,128,256} on Ryzen 
9 9950X
+                    //     shows the regression floor settles by T=32 and the 
choice of
+                    //     64 gives ~12x margin over avg list length without 
losing the
+                    //     flat-primitive wins.
+                    //   - `nulls.null_count() * 2 >= nulls.len()`: cached 
`null_count()`
+                    //     is O(1), so this check is free. We use the 
buffer-level density
+                    //     as a heuristic for the sub-range; for full-array 
writes (the
+                    //     primary target — flat primitive columns) it's exact.
+                    // Note: even when this gate skips the fast path, 
evaluating the gate
+                    // itself across high-frequency call sites (~10K calls in 
some list
+                    // benchmarks) is a small structural cost (~+1-2% on 
list-sparse
+                    // cases). It's the price of having any gate at all on 
this hot path;
+                    // reducing it further would require hoisting the decision 
into the
+                    // caller. The wins on the targeted shapes (-35% 
sparse-primitive,
+                    // -66% all-null primitive) far outweigh it.
+                    if len >= 64 && nulls.null_count() * 2 >= nulls.len() {

Review Comment:
   Please replace `64` with a constant. Documentation for the constant can 
point to this PR for an explanation as to how 64 was arrived at.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to