Artem Pelenitsyn <a.pelenit...@gmail.com> writes: > Thanks Ben, very interesting, especially the cloud Shake stuff. > >> If everyone were to use, e.g., >> ghc.nix this could be largely mitigated, but this isn't the world in >> which we live. > > I don't know why you'd want everyone to use it necessarily to improve > things. If there was a clear statement that you may elect to use > ghc.nix and get X% speedup, that would be a good start. And then > people can decide, based on their setup and their aversion to tools > like Nix, etc. in general. > > The real issue is that currently you don't benefit much from ghc.nix > because the main performance sink is the GHC tree itself. The way out > is to use a cloud build system, most likely Cloud Shake, which, as you > describe, has a couple of issues in this case: > > 1) Native dependencies. This should be possible to solve via Nix, but > unfortunately, this is not quite there yet because Shake doesn't > know about Nix (afaik). I think to actually get > there, you'd need some sort of integration between Nix and Shake akin to > what Tweag built for Nix and Bazel (cf. rules_nixpkgs [1]). Their > moto is: Nix for "external" or "system" components, and Bazel for > "internal" or "local" ones.
I disagree. Caching is quite feasible while maintaining a clear division between configuration and the build system. Today, the only channel of communication between `configure` and Hadrian is `hadrian/cfg/system.config`. If `ghc.nix` is doing its job correctly then two invocations of `./configure` in nix-shell on two different machines should end up with the same `hadrian/cfg/system.config`. Further, if two trees have the same contents in that file, then they can share build artifacts [1]. However, to reiterate, the real problem here is the one below: > 2) Bootstrapping aspect. Maybe this is a challenge for rebuilds after > modification, but I think people on this thread were quoting the > "time to first build" more. I don't see how avoiding to build > master locally after a fresh (worktree) checkout by downloading > build results from somewhere, connects to bootstrapping. I think it > doesn't. If you merely want to build `master` then indeed caching would work fine. However, in that case you could have also just downloaded a binary distribution from GitLab. The problem is that usually the reason that you want to build `master` is that you then want to *modify* it. In general, a modification of `master` will require rebuilding some subset of the stage1 ghc, which will then require a full build of stage2 (which includes GHC, as well as all of the boot libraries, `base`, etc.). The latter would see zero cache hits since one of the inputs, the stage 1 GHC, has changed. Unfortunately, the latter is also well over half of the build effort. > As for rebuilds. People are already using --freeze1 (you suggested it > earlier in this very thread!), > Yes, but they are doing so explicitly after having already built their branch to produce a consistent stage1 compiler. If you checkout `master`, build stage 1, switch to some arbitrary branch, and attempt to build stage2 with --freeze1, chances are you will end up with a broken compiler. In the best case this will manifest as a build failure. However, there is a non-negigible possibility that the outcome is far more sinister (e.g. segmentation faults). > so I don't see how saying "freezing stage 1 is dangerous even if > faster" connects to practise of GHC development. Of course, you may > not find a remote cache with relevant artefacts after local updates, > but that's not the point. The point is to not have to build `master`, > not `feaure-branch-t12345`. Rebuilds should be rather pain-free in > comparison. > For safe caching we must be certain that the build graph is accurate and complete. However, we know with certainty that it currently is not and fixing this requires real implementation effort (David Eichmann spent a few months on this problem in 2019; see #16926 and related tickets). Consequently, we must weigh the benefit of caching against the development cost. Currently, my sense is that the benefit would be some subset of the stage 1 build could be shared some of the time. This strikes me as a rather small benefit compared to the cost. Of course, we would love to have help in addressing #16926. In principle having build caching would be nice; however, at the moment we just don't believe that putting precious GHC team resources towards that goal is the most effective way to serve users. If someone were to come along and start chipping away at the #16926, we would be happy to advise and assist. Cheers, - Ben [1] Strictly speaking, I don't believe this is quite true today since the absolute path of the working tree almost certainly leaks into the build artifacts. However, in principle this could be fixed.
signature.asc
Description: PGP signature
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs