If we make the changes proposed in the "On CI" thread, I think we solve
most of this for free. If a perf test fails, resubmitting should not
need to rebuild GHC because it is cached. I'd say at that point, it's
not even worth making Marge bot auto-accept extra improvements because
restarting with new perf windows should be so easy.
John
On 2/22/21 8:46 AM, Andreas Klebinger wrote:
This seems quite reasonable to me.
Not sure about the cost of implementing it (and the feasability of it
if/when merge-trains arrive).
Andreas
Am 21/02/2021 um 21:31 schrieb Richard Eisenberg:
On Feb 21, 2021, at 11:24 AM, Ben Gamari <b...@well-typed.com
<mailto:b...@well-typed.com>> wrote:
To mitigate this I would suggest that we allow performance test failures
in marge-bot pipelines. A slightly weaker variant of this idea would
instead only allow performance *improvements*. I suspect the latter
would get most of the benefit, while eliminating the possibility that a
large regression goes unnoticed.
The value in making performance improvements a test failure is so
that patch authors can be informed of what they have done, to make
sure it matches expectations. This need can reasonably be satisfied
without stopping merging. That is, if Marge can accept performance
improvements, while (say) posting to each MR involved that it may
have contributed to a performance improvement, then I think we've
done our job here.
On the other hand, a performance degradation is a bug, just like,
say, an error message regression. Even if it's a combination of
commits that cause the problem (an actual possibility even for error
message regressions), it's still a bug that we need to either fix or
accept (balanced out by other improvements). The pain of debugging
this scenario might be mitigated if there were a collation of the
performance wibbles for each individual commit. This information is,
in general, available: each commit passed CI on its own, and so it
should be possible to create a little report with its rows being perf
tests and its columns being commits or MR #s; each cell in the table
would be a percentage regression. If we're lucky, the regression
Marge sees will be the sum(*) of the entries in one of the rows --
this means that we have a simple agglomeration of performance
degradation. If we're less lucky, the whole will not equal the sum of
the parts, and some of the patches interfere. In either case, the
table would suggest a likely place to look next.
(*) I suppose if we're recording percentages, it wouldn't necessarily
be the actual sum, because percentages are a bit funny. But you get
my meaning.
Pulling this all together:
* I'm against the initial proposal of allowing all performance
failures by Marge. This will allow bugs to accumulate (in my opinion).
* I'm in favor of allowing performance improvements to be accepted by
Marge.
* To mitigate against the information loss of Marge accepting
performance improvements, it would be great if Marge could alert MR
authors that a cumulative performance improvement took place.
* To mitigate against the annoyance of finding a performance
regression in a merge commit that does not appear in any component
commit, it would be great if there were a tool to collect performance
numbers from a set of commits and present them in a table for further
analysis.
These "mitigations" might take work. If labor is impossible to
produce to complete this work, I'm in favor of simply allowing the
performance improvements, maybe also filing a ticket about these
potential improvements to the process.
Richard
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs