> I'd be quite happy to accept a 25% regression on T9872c if it yielded
a 1% improvement on compiling Cabal. T9872 is very very very strange!
(Maybe if *all* the T9872 tests regressed, I'd be more worried.)

While I fully agree with this. We should *always* want to know if a
small syntetic benchmark regresses by a lot.
Or in other words we don't want CI to accept such a regression for us
ever, but the developer of a patch should need to explicitly ok it.

Otherwise we just slow down a lot of seldom-used code paths by a lot.

Now that isn't really an issue anyway I think. The question is rather is
2% a large enough regression to worry about? 5%? 10%?

Cheers,
Andreas

Am 17/03/2021 um 14:39 schrieb Richard Eisenberg:


On Mar 17, 2021, at 6:18 AM, Moritz Angermann
<moritz.angerm...@gmail.com <mailto:moritz.angerm...@gmail.com>> wrote:

But what do we expect of patch authors? Right now if five people
write patches to GHC, and each of them eventually manage to get their
MRs green, after a long review, they finally see it assigned to
marge, and then it starts failing? Their patch on its own was fine,
but their aggregate with other people's code leads to regressions? So
we now expect all patch authors together to try to figure out what
happened? Figuring out why something regressed is hard enough, and we
only have a very few people who are actually capable of debugging
this. Thus I believe it would end up with Ben, Andreas, Matthiew,
Simon, ... or someone else from GHC HQ anyway to figure out why it
regressed, be it in the Review Stage, or dissecting a marge
aggregate, or on master.

I have previously posted against the idea of allowing Marge to accept
regressions... but the paragraph above is sadly convincing. Maybe
Simon is right about opening up the windows to, say, be 100% (which
would catch a 10x regression) instead of infinite, but I'm now
convinced that Marge should be very generous in allowing regressions
-- provided we also have some way of monitoring drift over time.

Separately, I've been concerned for some time about the peculiarity of
our perf tests. For example, I'd be quite happy to accept a 25%
regression on T9872c if it yielded a 1% improvement on compiling
Cabal. T9872 is very very very strange! (Maybe if *all* the T9872
tests regressed, I'd be more worried.) I would be very happy to learn
that some more general, representative tests are included in our
examinations.

Richard

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to