alamb commented on pull request #8635:
URL: https://github.com/apache/arrow/pull/8635#issuecomment-725414703


   In my measurements, the `sum` compute kernels do appear to have significant 
variability from run to run on my machine (details below). The variability 
still exists with this PR though it appears to be less.
   
   I think one issue, as @vertexclique has expalined, is that that different 
random numbers are being used between runs (and between threasd) and thus the 
actual computations performed from run to run are changing.
   
   Seeding the random number generator so it always produces the same sequence 
of random numbers is definitely a classic way to reduce such variability and 
seems like a good idea to me. 
   
   Seeding the random number generators *can* be done using the existing `rand` 
crate, in the following way (this took me longer to figure out from the  `rand` 
crate's documentation than I would like to admit):
   
   Instead of
   ```
   let mut rng = rand::thread_rng();
   ```
   
   Use
   ```
   use rand::{
       Rng, SeedableRng,
       rngs::StdRng
   };
   let mut rng = StdRng::seed_from_u64(42);
   ```
   
   
   Here are some measurements I ran on my machine:
   
   Master @ f7027b43d10bab4d8ca9397a753dc3553d88f146
   ```
   sum 512                 time:   [452.64 ns 456.16 ns 459.94 ns]
   sum 512                 time:   [459.08 ns 462.78 ns 466.55 ns]
   sum 512                 time:   [457.80 ns 461.87 ns 466.06 ns]
   sum nulls 512           time:   [246.69 ns 248.65 ns 250.87 ns]
   sum nulls 512           time:   [269.41 ns 271.79 ns 274.21 ns]
   sum nulls 512           time:   [247.98 ns 250.42 ns 252.92 ns]
   ```
   
   ARROW-10551-fix-unreproducible-benches
   ```
   sum 512                 time:   [476.98 ns 482.19 ns 488.17 ns]
   sum 512                 time:   [470.75 ns 474.96 ns 479.42 ns]
   sum 512                 time:   [506.47 ns 508.00 ns 509.59 ns]
   sum nulls 512           time:   [268.39 ns 270.32 ns 272.56 ns]
   sum nulls 512           time:   [272.20 ns 274.46 ns 276.81 ns]
   sum nulls 512           time:   [266.60 ns 269.12 ns 272.28 ns]
   ```
   
   Master @ f7027b43d10bab4d8ca9397a753dc3553d88f146 w/ `StdRng::seed_from_u64`:
   ```
   sum 512                 time:   [463.24 ns 467.51 ns 472.28 ns]
   sum 512                 time:   [457.42 ns 460.00 ns 462.66 ns]
   sum 512                 time:   [466.96 ns 471.60 ns 476.52 ns]
   sum nulls 512           time:   [236.96 ns 238.34 ns 239.76 ns]
   sum nulls 512           time:   [241.24 ns 243.24 ns 245.50 ns]
   sum nulls 512           time:   [246.68 ns 248.60 ns 250.61 ns]
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to