Omega359 commented on PR #12027:
URL: https://github.com/apache/datafusion/pull/12027#issuecomment-2295332991

   I went ahead and wrote up a benchmark to verify my assumptions wrt 
performance @ 
https://github.com/Omega359/arrow-datafusion/blob/feature/string_arrays/datafusion/functions/benches/string_arrays.rs
   
   Here are the results on my machine. 16k array, 20% null, values are 32 char 
length. You can see that times when using an iterator is almost equivalent 
among the approaches tested. Using loops with .value(idx) does have varied 
times between the approaches with the fastest being the direct approach, 
followed by StringArrayType then the StringArrays approach. 
   
   ```
   string_arrays benchmark/StringArrays-iter-16384/32
                           time:   [18.011 µs 18.093 µs 18.172 µs]
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) low mild
     4 (4.00%) high mild
     2 (2.00%) high severe
   string_arrays benchmark/StringArrays-using_loop-16384/32
                           time:   [71.839 µs 72.050 µs 72.291 µs]
   Found 10 outliers among 100 measurements (10.00%)
     4 (4.00%) low mild
     5 (5.00%) high mild
     1 (1.00%) high severe
   string_arrays benchmark/direct-iter-16384/32
                           time:   [16.597 µs 16.654 µs 16.718 µs]
   Found 8 outliers among 100 measurements (8.00%)
     4 (4.00%) high mild
     4 (4.00%) high severe
   string_arrays benchmark/direct-using_loop-16384/32
                           time:   [27.554 µs 27.763 µs 27.989 µs]
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
     4 (4.00%) high severe
   string_arrays benchmark/StringArrayType-iter-16384/32
                           time:   [18.375 µs 18.752 µs 19.175 µs]
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) low mild
     4 (4.00%) high mild
   string_arrays benchmark/StringArrayType-using_loop-16384/32
                           time:   [44.859 µs 44.946 µs 45.042 µs]
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
     ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to