Ben Gamari <b...@well-typed.com> writes: > Hi all, > > Recently our performance tests have been causing quite some pain. One > reason for this is due to our new Darwin runners (see #19025), which > (surprisingly) differ significantly in their performance characteristics > (perhaps due to running Big Sur or using native tools provided by nix?). > > However, this is further exacerbated by the fact that there are quite a > few people working on compiler performance currently (horray!). This > leads to the following failure mode during Marge jobs: > > 1. Merge request A improves test T1234 by 0.5%, which is within the > test's acceptance window and therefore CI passes > > 2. Merge request B *also* improves test T1234 by another 0.5%, which > similarly passes CI > > 3. Marge tries to merge MRs A and B in a batch but finds that the > combined 1% improvement in T1234 is *outside* the acceptance window. > Consequently, the batch fails. > > This is quite painful, especially given that it creates work for those > trying to improve GHC (as the saying goes: no good deed goes > unpunished). > > To mitigate this I would suggest that we allow performance test failures > in marge-bot pipelines. A slightly weaker variant of this idea would > instead only allow performance *improvements*. I suspect the latter > would get most of the benefit, while eliminating the possibility that a > large regression goes unnoticed. > To get things un-stuck I have disabled the affected tests on Darwin for the time being. I hope we will be able to reenable these tests when we have migrated fully to the new runners although only time will tell.
I will try to rebase the open MRs that are currently failing only due to spurious performance failures but please do feel free to hit rebase yourself if I miss any. Cheers, - Ben
signature.asc
Description: PGP signature
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs