Hi all, Recently our performance tests have been causing quite some pain. One reason for this is due to our new Darwin runners (see #19025), which (surprisingly) differ significantly in their performance characteristics (perhaps due to running Big Sur or using native tools provided by nix?).
However, this is further exacerbated by the fact that there are quite a few people working on compiler performance currently (horray!). This leads to the following failure mode during Marge jobs: 1. Merge request A improves test T1234 by 0.5%, which is within the test's acceptance window and therefore CI passes 2. Merge request B *also* improves test T1234 by another 0.5%, which similarly passes CI 3. Marge tries to merge MRs A and B in a batch but finds that the combined 1% improvement in T1234 is *outside* the acceptance window. Consequently, the batch fails. This is quite painful, especially given that it creates work for those trying to improve GHC (as the saying goes: no good deed goes unpunished). To mitigate this I would suggest that we allow performance test failures in marge-bot pipelines. A slightly weaker variant of this idea would instead only allow performance *improvements*. I suspect the latter would get most of the benefit, while eliminating the possibility that a large regression goes unnoticed. Thoughts? Cheers, - Ben
signature.asc
Description: PGP signature
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs