tustvold commented on PR #4819:
URL: https://github.com/apache/arrow-rs/pull/4819#issuecomment-1721475833

   > do we have any coverage for convert "low" cardinality dictionaries
   
   This is a very good point, added some benchmarks with "low" cardinality 
dictionaries containing 10 unique values. The performance gain from this PR is 
reduced, but is still sizeable, especially in the case where the values haven't 
been seen before.
   
   ```
   convert_columns 4096 string_dictionary_low_cardinality(10, 0)
                           time:   [38.948 µs 38.956 µs 38.966 µs]
                           change: [-7.6416% -7.3418% -6.9125%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     3 (3.00%) high mild
     3 (3.00%) high severe
   
   convert_columns_prepared 4096 string_dictionary_low_cardinality(10, 0)
                           time:   [38.230 µs 38.244 µs 38.261 µs]
                           change: [-7.1958% -6.9091% -6.6087%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) high mild
     3 (3.00%) high severe
   
   convert_rows 4096 string_dictionary_low_cardinality(10, 0)
                           time:   [61.243 µs 61.258 µs 61.278 µs]
                           change: [-42.531% -42.345% -42.158%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) low mild
     1 (1.00%) high mild
     4 (4.00%) high severe
   
   convert_columns 4096 string_dictionary_low_cardinality(30, 0)
                           time:   [39.168 µs 39.175 µs 39.183 µs]
                           change: [-11.201% -11.002% -10.889%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   
   convert_columns_prepared 4096 string_dictionary_low_cardinality(30, 0)
                           time:   [38.485 µs 38.493 µs 38.500 µs]
                           change: [-7.6690% -7.4510% -7.3212%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) low mild
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   convert_rows 4096 string_dictionary_low_cardinality(30, 0)
                           time:   [61.568 µs 61.584 µs 61.605 µs]
                           change: [-42.307% -42.220% -42.080%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     2 (2.00%) high mild
     7 (7.00%) high severe
   
   convert_columns 4096 string_dictionary_low_cardinality(100, 0)
                           time:   [40.500 µs 40.509 µs 40.520 µs]
                           change: [-19.204% -19.017% -18.911%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     2 (2.00%) low mild
     5 (5.00%) high mild
     5 (5.00%) high severe
   
   convert_columns_prepared 4096 string_dictionary_low_cardinality(100, 0)
                           time:   [39.657 µs 39.665 µs 39.674 µs]
                           change: [-8.0164% -7.8251% -7.7172%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   convert_rows 4096 string_dictionary_low_cardinality(100, 0)
                           time:   [61.436 µs 61.448 µs 61.466 µs]
                           change: [-43.411% -43.330% -43.170%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to