Re: [PR] Bulk-fill definition levels for majority-null leaf columns [arrow-rs]

via GitHub Wed, 13 May 2026 07:22:08 -0700


RyanJamesStewart commented on PR #9967:
URL: https://github.com/apache/arrow-rs/pull/9967#issuecomment-4441977249


   Pushed a fmt fixup (the `BitIndexIterator::new(...)` call on the new path 
was wrapped wider than rustfmt wanted). Rust workflow should go green now.
   
   On the bench: the run on `7341370` flagged `list_primitive` / 
`list_primitive_sparse_99pct_null` at +8-12%. That was the per-range 
`count_set_bits_offset` plus the under-reserved scatter being paid on every 
`write_leaf` call from `write_list` to `write_non_null_slice`, where ranges 
average ~5 elements. The follow-up commit gates the new path on `len >= 64 && 
null_count() * 2 >= len` (using the cached `null_count` so there's no per-range 
popcount when global density is low), which on my sweep cleared the list 
regression to ~+1.7% (structural cost of evaluating the gate across ~10K calls; 
further reduction would need hoisting the decision into `write_list`).
   
   @etseidl could you re-run `run benchmark arrow_writer` once CI is green? 
Would be good to have the post-gate numbers on record before further review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Bulk-fill definition levels for majority-null leaf columns [arrow-rs]

Reply via email to