adriangb commented on PR #9972: URL: https://github.com/apache/arrow-rs/pull/9972#issuecomment-4622552203
@alamb I'm pretty convinced the benchmark regressions are just noise. For example: `main` vs. `main` in https://github.com/apache/arrow-rs/pull/9972#issuecomment-4618263660 `string/parquet_2` has an 18% swing. I ran a lot of testing on the exact same hardware as these bench runners and found that running any benchmark in isolation produces a <1% change on main vs. this branch. I measured instruction count and there was only a ~0.3% increase in instructions, not enough to cause a 70% performance impact. Running the entire suite I *was* able to reproduce the regressions, although not consistently. This led me to investigate allocator impact. The conclusion was that this is caused by rolling the dice on allocations, see https://github.com/apache/arrow-rs/pull/10068. These benches are all allocation heavy so any "bad luck" in ordering / timing of allocations (re-use vs. page fault) can result in big swings in results. With https://github.com/apache/arrow-rs/pull/10068 applied I see no regressions here even running the full suite. My suggestion is that we don't over-index on this and move forward with this change if everything else looks good. If you are happy with https://github.com/apache/arrow-rs/pull/10068 we can also merge that and re-run the benches, but I'm afraid that would delay the release and I do think it'd be nice to include this change in the release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
