Re: [PR] fix: take_bytes should not reuse indices null buffer [arrow-rs]

via GitHub Wed, 23 Oct 2024 10:42:40 -0700


tustvold commented on PR #6616:
URL: https://github.com/apache/arrow-rs/pull/6616#issuecomment-2432794522


   > For example, you tries to modify the indices array after calling take. But 
some cases you can do it because the array is not cloned, but some cases you 
cannot because the array is cloned.
   > I don't think this is good design for a kernel behavior.
   
   The contract of a kernel should be on the semantic value of the output, this 
leaves kernels free to implement physical layout optimisations such as this. I 
don't believe we ever document or articulate any contract on how various 
kernels should behave w.r.t their inputs, this would not only be very 
restrictive but extremely fragile. Take is far from the only kernel that will 
simply clone input buffers, especially if one considers types like 
DictionaryArray where the underlying dictionary may or may not be recomputed 
depending on heuristics.
   
   From the take kernel's perspective it has no way to know that you want to 
reuse the null buffer that it was given, so the correct thing is for it to not 
create a fresh allocation unnecessarily. If code then wants to try to reuse the 
buffer, it can try, falling back to performing the allocation if necessary. 
This is safe, sound and optimal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: take_bytes should not reuse indices null buffer [arrow-rs]

Reply via email to