EeshanBembi opened a new pull request, #22907: URL: https://github.com/apache/datafusion/pull/22907
## Which issue does this PR close? Follow-up to #21794, addressing review feedback from @alamb and @Dandandan. ## Rationale for this change In the review of #21794, several optimizations were suggested: - **@alamb** ([comment](https://github.com/apache/datafusion/pull/21794#discussion_r2174375498)): "as a possible future optimization, we could use `get_unchecked` here if it makes any difference" — referring to the slow-path `arr.views()[row]` access. - **@alamb** ([comment](https://github.com/apache/datafusion/pull/21794#discussion_r2174376693)): "from here on down I think this is basically the same as append_val_inner -- if there are any differences perhaps we can fold it into append_val_inner and avoid the copy" - **@Dandandan** ([comment](https://github.com/apache/datafusion/pull/21794#discussion_r2175204095)): "In principle we can make this faster as well - `extend` + reuse input view (instead of make_view) + avoid `array.value(row)`" ## What changes are included in this PR? **1. Refactored `do_append_val_inner` to use raw view access** Replaced `array.value(row)` + `make_view()` with raw view access via `get_unchecked(row)`: - **Inline (len <= 12):** push the u128 view as-is — no decode/re-encode round-trip - **Non-inline (len > 12):** parse via `ByteView::from(view)`, copy buffer data, reuse source prefix directly (avoids re-reading first 4 bytes) **2. Simplified the vectorized slow path** Replaced the duplicated 28-line loop body in `vectorized_append_inner` with `try_reserve` + a loop calling `do_append_val_inner`, eliminating code duplication. **3. Removed unused `make_view` import** ### Safety notes - **`get_unchecked` usage**: Consistent with `do_equal_to_inner` (same file) and `PrimitiveGroupValueBuilder` in `primitive.rs`, both of which use the same pattern. All callers derive row indices from enumeration over the input array length, guaranteeing validity. - **Buffer access safety**: When `data_buffers()` is empty, all views must have len <= 12 (Arrow invariant), so the non-inline branch is never entered. ## Are these changes tested? Covered by 6 existing unit tests in the `bytes_view` module plus 3 integration tests in the `multi_group_by` module. All 111 tests in the aggregates suite pass. ## Are there any user-facing changes? No. This is an internal refactor with no API changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
