alamb commented on pull request #8635:
URL: https://github.com/apache/arrow/pull/8635#issuecomment-725414703
In my measurements, the `sum` compute kernels do appear to have significant
variability from run to run on my machine (details below). The variability
still exists with this PR though it appears to be less.
I think one issue, as @vertexclique has expalined, is that that different
random numbers are being used between runs (and between threasd) and thus the
actual computations performed from run to run are changing.
Seeding the random number generator so it always produces the same sequence
of random numbers is definitely a classic way to reduce such variability and
seems like a good idea to me.
Seeding the random number generators *can* be done using the existing `rand`
crate, in the following way (this took me longer to figure out from the `rand`
crate's documentation than I would like to admit):
Instead of
```
let mut rng = rand::thread_rng();
```
Use
```
use rand::{
Rng, SeedableRng,
rngs::StdRng
};
let mut rng = StdRng::seed_from_u64(42);
```
Here are some measurements I ran on my machine:
Master @ f7027b43d10bab4d8ca9397a753dc3553d88f146
```
sum 512 time: [452.64 ns 456.16 ns 459.94 ns]
sum 512 time: [459.08 ns 462.78 ns 466.55 ns]
sum 512 time: [457.80 ns 461.87 ns 466.06 ns]
sum nulls 512 time: [246.69 ns 248.65 ns 250.87 ns]
sum nulls 512 time: [269.41 ns 271.79 ns 274.21 ns]
sum nulls 512 time: [247.98 ns 250.42 ns 252.92 ns]
```
ARROW-10551-fix-unreproducible-benches
```
sum 512 time: [476.98 ns 482.19 ns 488.17 ns]
sum 512 time: [470.75 ns 474.96 ns 479.42 ns]
sum 512 time: [506.47 ns 508.00 ns 509.59 ns]
sum nulls 512 time: [268.39 ns 270.32 ns 272.56 ns]
sum nulls 512 time: [272.20 ns 274.46 ns 276.81 ns]
sum nulls 512 time: [266.60 ns 269.12 ns 272.28 ns]
```
Master @ f7027b43d10bab4d8ca9397a753dc3553d88f146 w/ `StdRng::seed_from_u64`:
```
sum 512 time: [463.24 ns 467.51 ns 472.28 ns]
sum 512 time: [457.42 ns 460.00 ns 462.66 ns]
sum 512 time: [466.96 ns 471.60 ns 476.52 ns]
sum nulls 512 time: [236.96 ns 238.34 ns 239.76 ns]
sum nulls 512 time: [241.24 ns 243.24 ns 245.50 ns]
sum nulls 512 time: [246.68 ns 248.60 ns 250.61 ns]
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]