neilconway opened a new pull request, #20693:
URL: https://github.com/apache/datafusion/pull/20693

   ## Which issue does this PR close?
   
   N/A
   
   ## Rationale for this change
   
   Several array set operations (`array_union`, `array_intersect`, 
`array_distinct`, `array_except`) operate on all values in the underlying 
values buffer of a `ListArray` when doing batched row conversion. For sliced 
ListArrays, `values()` returns the full underlying buffer, which means we end 
up doing row conversion for rows that aren't in the visible slice. This was not 
a correctness issue but it is inefficient.
   
   ## What changes are included in this PR?
   
   - Change array set ops to do row conversion on the visible slice, not the 
full values buffer
   - Add unit tests for array set ops on sliced ListArrays. These tests pass 
with or without this PR, but it seems wise to have more test coverage for 
sliced ListArrays
   - Add benchmarks for array set ops on sliced ListArrays.
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   No.
   
   ## AI usage
   
   Multiple AI tools were used to iterate on this PR. I have reviewed and 
understand the resulting code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to