Re: [PR] Adds partial `cast` support for run-end encoded arrays [arrow-rs]

via GitHub Tue, 19 Nov 2024 15:06:11 -0800


RyanMarcus commented on PR #6752:
URL: https://github.com/apache/arrow-rs/pull/6752#issuecomment-2486929566


   My benchmark is a worst-case scenario, so every run length is 1, and thus 
the average is 1 as well. Not a realistic scenario, but illustrative of the 
worst case.
   
   If we increase all run lengths to 10 (which is the average in my 
application, at least) and keeping the logical data size the same, the results 
are:
   
   ```
   With PrimitiveBuilder:
   cast run end to flat    time:   [11.740 ms 11.778 ms 11.818 ms]
   
   With MutableArrayData:
   cast run end to flat    time:   [21.837 ms 21.917 ms 22.000 ms]
   ```
   
   Both approaches get faster, but the relative gap is larger.
   
   The current version hits the compromise you mentioned: the specialized 
kernel is used for {i/u/f}{8/16/32/64}, and the interpretation-powered one is 
used for the rest.
   
   I am not fully confident I correctly modified all of the `MutableArrayData` 
to handle internals, but the tests at least pass. If you want, I can swap back 
to the "dummy" version I proposed earlier, since the specialized kernel should 
hit the most common cases.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Adds partial `cast` support for run-end encoded arrays [arrow-rs]

Reply via email to