neilconway opened a new pull request, #20693: URL: https://github.com/apache/datafusion/pull/20693
## Which issue does this PR close? N/A ## Rationale for this change Several array set operations (`array_union`, `array_intersect`, `array_distinct`, `array_except`) operate on all values in the underlying values buffer of a `ListArray` when doing batched row conversion. For sliced ListArrays, `values()` returns the full underlying buffer, which means we end up doing row conversion for rows that aren't in the visible slice. This was not a correctness issue but it is inefficient. ## What changes are included in this PR? - Change array set ops to do row conversion on the visible slice, not the full values buffer - Add unit tests for array set ops on sliced ListArrays. These tests pass with or without this PR, but it seems wise to have more test coverage for sliced ListArrays - Add benchmarks for array set ops on sliced ListArrays. ## Are these changes tested? Yes. ## Are there any user-facing changes? No. ## AI usage Multiple AI tools were used to iterate on this PR. I have reviewed and understand the resulting code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
