[GitHub] [arrow-rs] tustvold commented on pull request #3558: Re-encode dictionaries in selection kernels

via GitHub Fri, 01 Sep 2023 06:24:23 -0700


tustvold commented on PR #3558:
URL: https://github.com/apache/arrow-rs/pull/3558#issuecomment-1702743135


   The latest benchmarks are
   
   ```
   concat str_dict 1024    time:   [4.1847 µs 4.1892 µs 4.1935 µs]
                           change: [-11.560% -11.248% -10.897%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   concat str_dict_sparse 1024
                           time:   [11.443 µs 11.452 µs 11.463 µs]
                           change: [+13.983% +14.341% +14.645%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   ```
   
   ```
   interleave dict(20, 0.0) 100 [0..100, 100..230, 450..1000]
                           time:   [3.4982 µs 3.5000 µs 3.5020 µs]
                           change: [+27.355% +27.663% +28.183%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   interleave dict(20, 0.0) 400 [0..100, 100..230, 450..1000]
                           time:   [6.5197 µs 6.5226 µs 6.5257 µs]
                           change: [+6.0766% +6.3786% +6.6982%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000]
                           time:   [12.783 µs 12.787 µs 12.791 µs]
                           change: [-4.4718% -4.2584% -4.1254%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]
                           time:   [13.352 µs 13.358 µs 13.364 µs]
                           change: [-2.1913% -1.9010% -1.6320%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     4 (4.00%) high mild
     4 (4.00%) high severe
   
   interleave dict_sparse(20, 0.0) 100 [0..100, 100..230, 450..1000]
                           time:   [3.5297 µs 3.5310 µs 3.5323 µs]
                           change: [+26.230% +26.553% +26.772%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     5 (5.00%) high mild
     2 (2.00%) high severe
   
   interleave dict_sparse(20, 0.0) 400 [0..100, 100..230, 450..1000]
                           time:   [6.5106 µs 6.5152 µs 6.5201 µs]
                           change: [+6.5054% +6.8053% +7.1029%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high severe
   
   interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000]
                           time:   [13.007 µs 13.010 µs 13.014 µs]
                           change: [-3.6668% -3.2717% -2.9611%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   
   interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]
                           time:   [13.288 µs 13.297 µs 13.307 µs]
                           change: [-2.8566% -2.6176% -2.4714%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   interleave dict_distinct 100
                           time:   [5.5472 µs 5.5536 µs 5.5606 µs]
                           change: [+0.6246% +0.9056% +1.1312%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 5 outliers among 100 measurements (5.00%)
     4 (4.00%) high mild
     1 (1.00%) high severe
   
   interleave dict_distinct 1024
                           time:   [5.5718 µs 5.5794 µs 5.5875 µs]
                           change: [+0.9659% +1.2001% +1.4061%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 10 outliers among 100 measurements (10.00%)
     6 (6.00%) high mild
     4 (4.00%) high severe
   
   interleave dict_distinct 2048
                           time:   [5.4061 µs 5.4136 µs 5.4212 µs]
                           change: [-2.1558% -1.8430% -1.5765%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
   ```
   
   My takeaway from this is that the performance regression is not significant 
enough in absolute terms to warrant not making this the default behaviour. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on pull request #3558: Re-encode dictionaries in selection kernels

Reply via email to