alamb commented on pull request #8635: URL: https://github.com/apache/arrow/pull/8635#issuecomment-725414703
In my measurements, the `sum` compute kernels do appear to have significant variability from run to run on my machine (details below). The variability still exists with this PR though it appears to be less. I think one issue, as @vertexclique has expalined, is that that different random numbers are being used between runs (and between threasd) and thus the actual computations performed from run to run are changing. Seeding the random number generator so it always produces the same sequence of random numbers is definitely a classic way to reduce such variability and seems like a good idea to me. Seeding the random number generators *can* be done using the existing `rand` crate, in the following way (this took me longer to figure out from the `rand` crate's documentation than I would like to admit): Instead of ``` let mut rng = rand::thread_rng(); ``` Use ``` use rand::{ Rng, SeedableRng, rngs::StdRng }; let mut rng = StdRng::seed_from_u64(42); ``` Here are some measurements I ran on my machine: Master @ f7027b43d10bab4d8ca9397a753dc3553d88f146 ``` sum 512 time: [452.64 ns 456.16 ns 459.94 ns] sum 512 time: [459.08 ns 462.78 ns 466.55 ns] sum 512 time: [457.80 ns 461.87 ns 466.06 ns] sum nulls 512 time: [246.69 ns 248.65 ns 250.87 ns] sum nulls 512 time: [269.41 ns 271.79 ns 274.21 ns] sum nulls 512 time: [247.98 ns 250.42 ns 252.92 ns] ``` ARROW-10551-fix-unreproducible-benches ``` sum 512 time: [476.98 ns 482.19 ns 488.17 ns] sum 512 time: [470.75 ns 474.96 ns 479.42 ns] sum 512 time: [506.47 ns 508.00 ns 509.59 ns] sum nulls 512 time: [268.39 ns 270.32 ns 272.56 ns] sum nulls 512 time: [272.20 ns 274.46 ns 276.81 ns] sum nulls 512 time: [266.60 ns 269.12 ns 272.28 ns] ``` Master @ f7027b43d10bab4d8ca9397a753dc3553d88f146 w/ `StdRng::seed_from_u64`: ``` sum 512 time: [463.24 ns 467.51 ns 472.28 ns] sum 512 time: [457.42 ns 460.00 ns 462.66 ns] sum 512 time: [466.96 ns 471.60 ns 476.52 ns] sum nulls 512 time: [236.96 ns 238.34 ns 239.76 ns] sum nulls 512 time: [241.24 ns 243.24 ns 245.50 ns] sum nulls 512 time: [246.68 ns 248.60 ns 250.61 ns] ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org