Re: [PR] Adds partial `cast` support for run-end encoded arrays [arrow-rs]

via GitHub Tue, 19 Nov 2024 11:26:27 -0800


RyanMarcus commented on PR #6752:
URL: https://github.com/apache/arrow-rs/pull/6752#issuecomment-2486570327


   I've added a more efficient `extend_n` function that works by passing the 
count parameter `n` to each closure. But I realized that in the worst case 
scenario, where each run length is 1, this approach will still do 
interpretation for every value.
   
   A simple benchmark converting a REE array of i32s to a primitive array:
   
   ```
   With PrimitiveBuilder:
   cast run end to flat    time:   [34.395 ms 34.453 ms 34.519 ms]
   
   With MutableArrayData:
   cast run end to flat    time:   [48.710 ms 48.869 ms 49.067 ms]
   ```
   
   ... so the avoiding the interpretation overhead seems to cause a 30% 
speedup. Does a 30% performance improvement in the worst case justify the extra 
codegen, @tustvold ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Adds partial `cast` support for run-end encoded arrays [arrow-rs]

Reply via email to