RE: On CI

Simon Peyton Jones via ghc-devs Mon, 22 Feb 2021 01:07:03 -0800

Incremental CI can cut multiple hours to < mere minutes, especially with the 
test suite being embarrassingly parallel. There simply no way optimizations to 
the compiler independent from sharing a cache between CI runs can get anywhere 
close to that return on investment.
I rather agree with this.  I don't think there is much low-hanging fruit on 
compile times, aside from coercion-zapping which we are working on anyway.  If 
we got a 10% reduction in compile time we'd be over the moon, but our users 
would barely notice.

To get truly substantial improvements (a factor of 2 or 10) I think we need to 
do less compiling - hence incremental CI.

Simon

From: ghc-devs <ghc-devs-boun...@haskell.org> On Behalf Of John Ericson
Sent: 22 February 2021 05:53
To: ghc-devs <ghc-devs@haskell.org>
Subject: Re: On CI

I'm not opposed to some effort going into this, but I would strongly opposite 
putting all our effort there. Incremental CI can cut multiple hours to < mere 
minutes, especially with the test suite being embarrassingly parallel. There 
simply no way optimizations to the compiler independent from sharing a cache 
between CI runs can get anywhere close to that return on investment.

(FWIW, I'm also skeptical that the people complaining about GHC performance 
know what's hurting them most. For example, after non-incrementality, the next 
slowest thing is linking, which is...not done by GHC! But all that is a 
separate conversation.)

John
On 2/19/21 2:42 PM, Richard Eisenberg wrote:
There are some good ideas here, but I want to throw out another one: put all 
our effort into reducing compile times. There is a loud plea to do this on 
Discourse<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdiscourse.haskell.org%2Ft%2Fcall-for-ideas-forming-a-technical-agenda%2F1901%2F24&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691120329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=1CV0MEVUZpbAbmKAWTIiqLgjft7IbN%2BCSnvB3W3iX%2FU%3D&reserved=0>,
 and it would both solve these CI problems and also help everyone else.

This isn't to say to stop exploring the ideas here. But since time is mostly 
fixed, tackling compilation times in general may be the best way out of this. 
Ben's survey of other projects (thanks!) shows that we're way, way behind in 
how long our CI takes to run.

Richard

On Feb 19, 2021, at 7:20 AM, Sebastian Graf 
<sgraf1...@gmail.com<mailto:sgraf1...@gmail.com>> wrote:

Recompilation avoidance

I think in order to cache more in CI, we first have to invest some time in 
fixing recompilation avoidance in our bootstrapped build system.

I just tested on a hadrian perf ticky build: Adding one line of *comment* in 
the compiler causes

  *   a (pretty slow, yet negligible) rebuild of the stage1 compiler
  *   2 minutes of RTS rebuilding (Why do we have to rebuild the RTS? It 
doesn't depend in any way on the change I made)
  *   apparent full rebuild the libraries
  *   apparent full rebuild of the stage2 compiler
That took 17 minutes, a full build takes ~45minutes. So there definitely is 
some caching going on, but not nearly as much as there could be.
I know there have been great and boring efforts on compiler determinism in the 
past, but either it's not good enough or our build system needs fixing.
I think a good first step to assert would be to make sure that the hash of the 
stage1 compiler executable doesn't change if I only change a comment.
I'm aware there probably is stuff going on, like embedding configure dates in 
interface files and executables, that would need to go, but if possible this 
would be a huge improvement.

On the other hand, we can simply tack on a [skip ci] to the commit message, as 
I did for 
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fmerge_requests%2F4975&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691130329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=bgT0LeZXjF%2BMklzctvZL6WaVpaddN7%2FSpojcEXGXv7Q%3D&reserved=0>.
 Variants like [skip tests] or [frontend] could help to identify which tests to 
run by default.

Lean

I had a chat with a colleague about how they do CI for Lean. Apparently, CI 
turnaround time including tests is generally 25 minutes (~15 minutes for the 
build) for a complete pipeline, testing 6 different OSes and configurations in 
parallel: 
https://github.com/leanprover/lean4/actions/workflows/ci.yml<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fleanprover%2Flean4%2Factions%2Fworkflows%2Fci.yml&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691140326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9MEWPlRhO2xZK2iu5OqzXS9RZqc9pKNJcGDv7Nj3hyA%3D&reserved=0>
They utilise ccache to cache the clang-based C++-backend, so that they only 
have to re-run the front- and middle-end. In effect, they take advantage of the 
fact that the "function" clang, in contrast to the "function" stage1 compiler, 
stays the same.
It's hard to achieve that for GHC, where a complete compiler pipeline comes as 
one big, fused "function": An external tool can never be certain that a change 
to Parser.y could not affect the CodeGen phase.

Inspired by Lean, the following is a bit inconcrete and imaginary, but maybe we 
could make it so that compiler phases "sign" parts of the interface file with 
the binary hash of the respective subcomponents of the phase?
E.g., if all the object files that influence CodeGen (that will later be linked 
into the stage1 compiler) result in a hash of 0xdeadbeef before and after the 
change to Parser.y, we know we can stop recompiling Data.List with the stage1 
compiler when we see that the IR passed to CodeGen didn't change, because the 
last compile did CodeGen with a stage1 compiler with the same hash 0xdeadbeef. 
The 0xdeadbeef hash is a proxy for saying "the function CodeGen stayed the 
same", so we can reuse its cached outputs.
Of course, that is utopic without a tool that does the "taint analysis" of 
which modules in GHC influence CodeGen. Probably just including all the 
transitive dependencies of GHC.CmmToAsm suffices, but probably that's too crude 
already. For another example, a change to GHC.Utils.Unique would probably 
entail a full rebuild of the compiler because it basically affects all compiler 
phases.
There are probably parallels with recompilation avoidance in a language with 
staged meta-programming.

Am Fr., 19. Feb. 2021 um 11:42 Uhr schrieb Josef Svenningsson via ghc-devs 
<ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>>:
Doing "optimistic caching" like you suggest sounds very promising. A way to 
regain more robustness would be as follows.
If the build fails while building the libraries or the stage2 compiler, this 
might be a false negative due to the optimistic caching. Therefore, evict the 
"optimistic caches" and restart building the libraries. That way we can 
validate that the build failure was a true build failure and not just due to 
the aggressive caching scheme.

Just my 2p

Josef

________________________________
From: ghc-devs 
<ghc-devs-boun...@haskell.org<mailto:ghc-devs-boun...@haskell.org>> on behalf 
of Simon Peyton Jones via ghc-devs 
<ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>>
Sent: Friday, February 19, 2021 8:57 AM
To: John Ericson 
<john.ericson@obsidian.systems<mailto:john.ericson@obsidian.systems>>; ghc-devs 
<ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>>
Subject: RE: On CI

  1.  Building and testing happen together. When tests failure spuriously, we 
also have to rebuild GHC in addition to re-running the tests. That's pure 
waste. 
https://gitlab.haskell.org/ghc/ghc/-/issues/13897<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691140326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Nm6vfgGLLlJpiGa8XKxI6kNkBetp8ZZLPZS8hF%2BydrM%3D&reserved=0>
 tracks this more or less.
I don't get this.  We have to build GHC before we can test it, don't we?
2 .  We don't cache between jobs.
This is, I think, the big one.   We endlessly build the exact same binaries.
There is a problem, though.  If we make *any* change in GHC, even a trivial 
refactoring, its binary will change slightly.  So now any caching build system 
will assume that anything built by that GHC must be rebuilt - we can't use the 
cached version.  That includes all the libraries and the stage2 compiler.  So 
caching can save all the preliminaries (building the initial Cabal, and large 
chunk of stage1, since they are built with the same bootstrap compiler) but 
after that we are dead.
I don't know any robust way out of this.  That small change in the source code 
of GHC might be trivial refactoring, or it might introduce a critical 
mis-compilation which we really want to see in its build products.
However, for smoke-testing MRs, on every architecture, we could perhaps cut 
corners.  (Leaving Marge to do full diligence.)  For example, we could declare 
that if we have the result of compiling library module X.hs with the stage1 GHC 
in the last full commit in master, then we can re-use that build product rather 
than compiling X.hs with the MR's slightly modified stage1 GHC.  That *might* 
be wrong; but it's usually right.
Anyway, there are big wins to be had here.
Simon

From: ghc-devs 
<ghc-devs-boun...@haskell.org<mailto:ghc-devs-boun...@haskell.org>> On Behalf 
Of John Ericson
Sent: 19 February 2021 03:19
To: ghc-devs <ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>>
Subject: Re: On CI

I am also wary of us to deferring checking whole platforms and what not. I 
think that's just kicking the can down the road, and will result in more 
variance and uncertainty. It might be alright for those authoring PRs, but it 
will make Ben's job keeping the system running even more grueling.
Before getting into these complex trade-offs, I think we should focus on the 
cornerstone issue that CI isn't incremental.

  1.  Building and testing happen together. When tests failure spuriously, we 
also have to rebuild GHC in addition to re-running the tests. That's pure 
waste. 
https://gitlab.haskell.org/ghc/ghc/-/issues/13897<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691150320%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=KlQGW1amK%2BtlRTGl4cDgMyl%2Bfz17fuUAHFNAaNXbzZI%3D&reserved=0>
 tracks this more or less.
  2.  We don't cache between jobs. Shake and Make do not enforce dependency 
soundness, nor cache-correctness when the build plan itself changes, and this 
had made this hard/impossible to do safely. Naively this only helps with stage 
1 and not stage 2, but if we have separate stage 1 and --freeze1 stage 2 
builds, both can be incremental. Yes, this is also lossy, but I only see it 
leading to false failures not false acceptances (if we can also test the stage 
1 one), so I consider it safe. MRs that only work with a slow full build 
because ABI can so indicate.
The second, main part is quite hard to tackle, but I strongly believe 
incrementality is what we need most, and what we should remain focused on.
John
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691160313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uE1IOblLTYJ2j3H2vkFKgQyVZs5sehXd1Tl70X0kUqE%3D&reserved=0>
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691160313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uE1IOblLTYJ2j3H2vkFKgQyVZs5sehXd1Tl70X0kUqE%3D&reserved=0>

_______________________________________________

ghc-devs mailing list

ghc-devs@haskell.org<mailto:ghc-devs@haskell.org>

http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691170308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Yrob9grqAWOxZnFXcM%2BZ60VNsrhIejcmwkSIR3Wq0gA%3D&reserved=0>

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

RE: On CI

Reply via email to