[PR] Optimize ByteViewGroupValueBuilder vectorized_append [datafusion]

via GitHub Sun, 12 Apr 2026 23:32:43 -0700


Dandandan opened a new pull request, #21579:
URL: https://github.com/apache/datafusion/pull/21579


   ## Which issue does this PR close?
   
   N/A - Performance optimization
   
   ## Rationale for this change
   
   The `vectorized_append_inner` method in `ByteViewGroupValueBuilder` was 
using `make_view` to reconstruct views from scratch (re-reading value bytes to 
build the prefix) and copying non-inline data one row at a time in a `for` 
loop. Both are unnecessary since the input array already has correctly formed 
views.
   
   ## What changes are included in this PR?
   
   - **View reuse**: Copy input views directly instead of reconstructing via 
`make_view`. For inline values (len ≤ 12), the view is copied as-is. For 
non-inline values, the view's length and prefix are preserved, only updating 
`buffer_index`/`offset` via `ByteView`'s builder API.
   - **Use `extend`** instead of `for` loops for both `Nulls::None` and 
`Nulls::Some` branches.
   - **Doubling allocation strategy** for `in_progress` buffer: starts at 2MB, 
doubles on each flush up to 32MB max. Reduces buffer flush frequency for large 
workloads.
   - Extract shared `copy_view` helper used by both `append_val_inner` and 
`vectorized_append_inner`.
   - Remove now-unused `do_append_val_inner`, `ensure_in_progress_big_enough`, 
and `make_view` import.
   
   ## Benchmark results
   
   ```
   inline_null_0.0_size_1000/vectorized_append (8B strings, all inline)
                           time:   [2.4193 µs 2.4634 µs 2.5072 µs]
                           change: [−47.891% −46.365% −44.823%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   scenario_null_0.0_size_1000/vectorized_append (64B strings, non-inline)
                           time:   [6.9834 µs 7.2868 µs 7.6747 µs]
                           change: [−37.848% −35.290% −32.429%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   random_null_0.0_size_1000/vectorized_append (up to 400B strings, mixed)
                           time:   [16.289 µs 16.863 µs 17.456 µs]
                           change: [−17.687% −14.685% −11.971%] (p = 0.00 < 
0.05)
                           Performance has improved.
   ```
   
   | Case | Before | After | Improvement |
   |------|--------|-------|-------------|
   | inline (8B) | 4.29 µs | 2.46 µs | **-46%** |
   | scenario (64B) | 11.43 µs | 7.29 µs | **-36%** |
   | random (≤400B) | 20.61 µs | 16.86 µs | **-18%** |
   
   ## Are these changes tested?
   
   Existing tests cover the functionality (6/6 pass). Verified with `cargo 
check`, `cargo clippy`, and `cargo test`.
   
   ## Are there any user-facing changes?
   
   No - internal performance optimization only.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Optimize ByteViewGroupValueBuilder vectorized_append [datafusion]

Reply via email to