Re: On CI

John Ericson Mon, 22 Feb 2021 09:43:19 -0800

I agree one should be able to get most of the testing value from stage1.And the tooling team at IOHK has done some work inhttps://gitlab.haskell.org/ghc/ghc/-/merge_requests/3652 to allow astage 1 compiler to be tested. That's a very important first step!

But TH and GHCi require either iserv (external interpreter) or ancompiler whose own ABI and the outputted ABI match for the internalinterpreter, and ideally we should test both. I think doing a --freeze1stage2 build *in addition* to the stage1 build would work in themajority of cases, and that would allow us to incrementally build andtest both. Remember that iserv uses the ghc library, and needs to be ABIcomparable with the stage1 compiler that is using it, so it is less apanacea than it might seem like for ABI changes vs mere cross compilation.

I opened https://github.com/ghc-proposals/ghc-proposals/issues/162 foran ABI-agnostic interpreter that would allow stage1 alone to do GHCi andTH a third away unconditionally. This would also allow TH to safely beused in GHC itself, but for the purposes of this discussion, it's niceto make testing more reliable without the --freeze1 stage 2 gamble.

Bottom line is, yes, building stage 2 from a freshly-built stage 1 willinvalidate any cache, and so we should avoid that.


John

On 2/22/21 8:42 AM, Spiwack, Arnaud wrote:

Let me know if I'm talking nonsense, but I believe that we arebuilding both stages for each architecture and flavour. Do we need tobuild two stages everywhere? What stops us from building a singlestage? And if anything, what can we change to get into a situationwhere we can?


Quite better than reusing build incrementally, is not building at all.

On Mon, Feb 22, 2021 at 10:09 AM Simon Peyton Jones via ghc-devs<ghc-devs@haskell.org <mailto:ghc-devs@haskell.org>> wrote:


    Incremental CI can cut multiple hours to < mere minutes,
    especially with the test suite being embarrassingly parallel.
    There simply no way optimizations to the compiler independent from
    sharing a cache between CI runs can get anywhere close to that
    return on investment.

    I rather agree with this.  I don’t think there is much low-hanging
    fruit on compile times, aside from coercion-zapping which we are
    working on anyway.  If we got a 10% reduction in compile time we’d
    be over the moon, but our users would barely notice.

    To get truly substantial improvements (a factor of 2 or 10) I
    think we need to do less compiling – hence incremental CI.


    Simon

    *From:*ghc-devs <ghc-devs-boun...@haskell.org
    <mailto:ghc-devs-boun...@haskell.org>> *On Behalf Of *John Ericson
    *Sent:* 22 February 2021 05:53
    *To:* ghc-devs <ghc-devs@haskell.org <mailto:ghc-devs@haskell.org>>
    *Subject:* Re: On CI

    I'm not opposed to some effort going into this, but I would
    strongly opposite putting all our effort there. Incremental CI can
    cut multiple hours to < mere minutes, especially with the test
    suite being embarrassingly parallel. There simply no way
    optimizations to the compiler independent from sharing a cache
    between CI runs can get anywhere close to that return on investment.

    (FWIW, I'm also skeptical that the people complaining about GHC
    performance know what's hurting them most. For example, after
    non-incrementality, the next slowest thing is linking, which
    is...not done by GHC! But all that is a separate conversation.)

    John

    On 2/19/21 2:42 PM, Richard Eisenberg wrote:

        There are some good ideas here, but I want to throw out
        another one: put all our effort into reducing compile times.
        There is a loud plea to do this on Discourse
        
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdiscourse.haskell.org%2Ft%2Fcall-for-ideas-forming-a-technical-agenda%2F1901%2F24&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691120329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=1CV0MEVUZpbAbmKAWTIiqLgjft7IbN%2BCSnvB3W3iX%2FU%3D&reserved=0>,
        and it would both solve these CI problems and also help
        everyone else.

        This isn't to say to stop exploring the ideas here. But since
        time is mostly fixed, tackling compilation times in general
        may be the best way out of this. Ben's survey of other
        projects (thanks!) shows that we're way, way behind in how
        long our CI takes to run.

        Richard



            On Feb 19, 2021, at 7:20 AM, Sebastian Graf
            <sgraf1...@gmail.com <mailto:sgraf1...@gmail.com>> wrote:

            Recompilation avoidance

            I think in order to cache more in CI, we first have to
            invest some time in fixing recompilation avoidance in our
            bootstrapped build system.

            I just tested on a hadrian perf ticky build: Adding one
            line of *comment* in the compiler causes

              * a (pretty slow, yet negligible) rebuild of the stage1
                compiler
              * 2 minutes of RTS rebuilding (Why do we have to rebuild
                the RTS? It doesn't depend in any way on the change I
                made)
              * apparent full rebuild the libraries
              * apparent full rebuild of the stage2 compiler

            That took 17 minutes, a full build takes ~45minutes. So
            there definitely is some caching going on, but not nearly
            as much as there could be.

            I know there have been great and boring efforts on
            compiler determinism in the past, but either it's not good
            enough or our build system needs fixing.

            I think a good first step to assert would be to make sure
            that the hash of the stage1 compiler executable doesn't
            change if I only change a comment.

            I'm aware there probably is stuff going on, like embedding
            configure dates in interface files and executables, that
            would need to go, but if possible this would be a huge
            improvement.

            On the other hand, we can simply tack on a [skip ci] to
            the commit message, as I did for
            https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4975
            
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fmerge_requests%2F4975&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691130329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=bgT0LeZXjF%2BMklzctvZL6WaVpaddN7%2FSpojcEXGXv7Q%3D&reserved=0>.
            Variants like [skip tests] or [frontend] could help to
            identify which tests to run by default.

            Lean

            I had a chat with a colleague about how they do CI for
            Lean. Apparently, CI turnaround time including tests is
            generally 25 minutes (~15 minutes for the build) for a
            complete pipeline, testing 6 different OSes and
            configurations in parallel:
            https://github.com/leanprover/lean4/actions/workflows/ci.yml
            
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fleanprover%2Flean4%2Factions%2Fworkflows%2Fci.yml&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691140326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9MEWPlRhO2xZK2iu5OqzXS9RZqc9pKNJcGDv7Nj3hyA%3D&reserved=0>

            They utilise ccache to cache the clang-based C++-backend,
            so that they only have to re-run the front- and
            middle-end. In effect, they take advantage of the fact
            that the "function" clang, in contrast to the "function"
            stage1 compiler, stays the same.

            It's hard to achieve that for GHC, where a complete
            compiler pipeline comes as one big, fused "function": An
            external tool can never be certain that a change to
            Parser.y could not affect the CodeGen phase.

            Inspired by Lean, the following is a bit inconcrete and
            imaginary, but maybe we could make it so that compiler
            phases "sign" parts of the interface file with the binary
            hash of the respective subcomponents of the phase?

            E.g., if all the object files that influence CodeGen (that
            will later be linked into the stage1 compiler) result in a
            hash of 0xdeadbeef before and after the change to
            Parser.y, we know we can stop recompiling Data.List with
            the stage1 compiler when we see that the IR passed to
            CodeGen didn't change, because the last compile did
            CodeGen with a stage1 compiler with the same hash
            0xdeadbeef. The 0xdeadbeef hash is a proxy for saying "the
            function CodeGen stayed the same", so we can reuse its
            cached outputs.

            Of course, that is utopic without a tool that does the
            "taint analysis" of which modules in GHC influence
            CodeGen. Probably just including all the transitive
            dependencies of GHC.CmmToAsm suffices, but probably that's
            too crude already. For another example, a change to
            GHC.Utils.Unique would probably entail a full rebuild of
            the compiler because it basically affects all compiler phases.

            There are probably parallels with recompilation avoidance
            in a language with staged meta-programming.

            Am Fr., 19. Feb. 2021 um 11:42 Uhr schrieb Josef
            Svenningsson via ghc-devs <ghc-devs@haskell.org
            <mailto:ghc-devs@haskell.org>>:

                Doing "optimistic caching" like you suggest sounds
                very promising. A way to regain more robustness would
                be as follows.

                If the build fails while building the libraries or the
                stage2 compiler, this might be a false negative due to
                the optimistic caching. Therefore, evict the
                "optimistic caches" and restart building the
                libraries. That way we can validate that the build
                failure was a true build failure and not just due to
                the aggressive caching scheme.

                Just my 2p

                Josef

                
------------------------------------------------------------------------

                *From:* ghc-devs <ghc-devs-boun...@haskell.org
                <mailto:ghc-devs-boun...@haskell.org>> on behalf of
                Simon Peyton Jones via ghc-devs <ghc-devs@haskell.org
                <mailto:ghc-devs@haskell.org>>
                *Sent:* Friday, February 19, 2021 8:57 AM
                *To:* John Ericson <john.ericson@obsidian.systems
                <mailto:john.ericson@obsidian.systems>>; ghc-devs
                <ghc-devs@haskell.org <mailto:ghc-devs@haskell.org>>
                *Subject:* RE: On CI

                 1. Building and testing happen together. When tests
                    failure spuriously, we also have to rebuild GHC in
                    addition to re-running the tests. That's pure
                    waste.
                    https://gitlab.haskell.org/ghc/ghc/-/issues/13897
                    
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691140326%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Nm6vfgGLLlJpiGa8XKxI6kNkBetp8ZZLPZS8hF%2BydrM%3D&reserved=0>
                    tracks this more or less.

                I don’t get this.  We have to build GHC before we can
                test it, don’t we?

                2 .  We don't cache between jobs.

                This is, I think, the big one.   We endlessly build
                the exact same binaries.

                There is a problem, though.  If we make **any** change
                in GHC, even a trivial refactoring, its binary will
                change slightly.  So now any caching build system will
                assume that anything built by that GHC must be rebuilt
                – we can’t use the cached version.  That includes all
                the libraries and the stage2 compiler.  So caching can
                save all the preliminaries (building the initial
                Cabal, and large chunk of stage1, since they are built
                with the same bootstrap compiler) but after that we
                are dead.

                I don’t know any robust way out of this. That small
                change in the source code of GHC might be trivial
                refactoring, or it might introduce a critical
                mis-compilation which we really want to see in its
                build products.

                However, for smoke-testing MRs, on every architecture,
                we could perhaps cut corners.  (Leaving Marge to do
                full diligence.)  For example, we could declare that
                if we have the result of compiling library module X.hs
                with the stage1 GHC in the last full commit in master,
                then we can re-use that build product rather than
                compiling X.hs with the MR’s slightly modified stage1
                GHC.  That **might** be wrong; but it’s usually right.

                Anyway, there are big wins to be had here.

                Simon

                *From:*ghc-devs <ghc-devs-boun...@haskell.org
                <mailto:ghc-devs-boun...@haskell.org>> *On Behalf Of
                *John Ericson
                *Sent:* 19 February 2021 03:19
                *To:* ghc-devs <ghc-devs@haskell.org
                <mailto:ghc-devs@haskell.org>>
                *Subject:* Re: On CI

                I am also wary of us to deferring checking whole
                platforms and what not. I think that's just kicking
                the can down the road, and will result in more
                variance and uncertainty. It might be alright for
                those authoring PRs, but it will make Ben's job
                keeping the system running even more grueling.

                Before getting into these complex trade-offs, I think
                we should focus on the cornerstone issue that CI isn't
                incremental.

                 1. Building and testing happen together. When tests
                    failure spuriously, we also have to rebuild GHC in
                    addition to re-running the tests. That's pure
                    waste.
                    https://gitlab.haskell.org/ghc/ghc/-/issues/13897
                    
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2F-%2Fissues%2F13897&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691150320%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=KlQGW1amK%2BtlRTGl4cDgMyl%2Bfz17fuUAHFNAaNXbzZI%3D&reserved=0>
                    tracks this more or less.
                 2. We don't cache between jobs. Shake and Make do not
                    enforce dependency soundness, nor
                    cache-correctness when the build plan itself
                    changes, and this had made this hard/impossible to
                    do safely. Naively this only helps with stage 1
                    and not stage 2, but if we have separate stage 1
                    and --freeze1 stage 2 builds, both can be
                    incremental. Yes, this is also lossy, but I only
                    see it leading to false failures not false
                    acceptances (if we can also test the stage 1 one),
                    so I consider it safe. MRs that only work with a
                    slow full build because ABI can so indicate.

                The second, main part is quite hard to tackle, but I
                strongly believe incrementality is what we need most,
                and what we should remain focused on.

                John

                _______________________________________________
                ghc-devs mailing list
                ghc-devs@haskell.org <mailto:ghc-devs@haskell.org>
                http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
                
<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691160313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uE1IOblLTYJ2j3H2vkFKgQyVZs5sehXd1Tl70X0kUqE%3D&reserved=0>

            _______________________________________________
            ghc-devs mailing list
            ghc-devs@haskell.org <mailto:ghc-devs@haskell.org>
            http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
            
<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691160313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uE1IOblLTYJ2j3H2vkFKgQyVZs5sehXd1Tl70X0kUqE%3D&reserved=0>



        _______________________________________________

        ghc-devs mailing list

        ghc-devs@haskell.org  <mailto:ghc-devs@haskell.org>

        http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs  
<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=04%7C01%7Csimonpj%40microsoft.com%7C9d7043627f5042598e5b08d8d6f648c4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637495701691170308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Yrob9grqAWOxZnFXcM%2BZ60VNsrhIejcmwkSIR3Wq0gA%3D&reserved=0>

    _______________________________________________
    ghc-devs mailing list
    ghc-devs@haskell.org <mailto:ghc-devs@haskell.org>
    http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
    <http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs>

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: On CI

Reply via email to