Ah! I see. This is a bit disappointing as it reduces the utility of fsatrace linting: the programmer is forced to decide if shallow dependencies are sufficient (changes in deep dependencies always change shallow dependencies). Hopeful similar scenarios are rare. Perhaps the best step forward is to simply silence the linting using `trackAllow ["//*.hi"]` for haskell object rules. Then I can continue tracking down other missing dependencies in Hadrian with fsatrace linting.

On 3/27/19 5:38 PM, Andrey Mokhov wrote:
Simon's insight is great: if deep dependencies are captured by shallow dependencies then the cloud build system is correct even if only direct shallow inputs are tracked.

That's a very non-trivial invariant, and I guess this means we can't rely on fsatrace linting for GHC compilation rules, because all deep dependencies will be reported as untracked.

Cheers,
Andrey

On 27 Mar 2019 18:27, Simon Peyton Jones <simo...@microsoft.com> wrote:

    With that in mind, and considering a cloud build system where
    "*all direct inputs and direct outputs must be declared*"

    *But I question that assumption*.   As I mentioned, with GHC at
    least, the if a deep dependency changes then one of the shallow
    dependencies will change.  So I claim that even for cloud build it
    should be enough to depend only on shallow dependencies.

    This is only true because GHC offers this guarantee.  We’d need to
    be sure that every deep dependency was either ‘needed’ or was
    reflected in the contents (perhaps via a fingerprint) another
    ‘needed’ thing.

    Simon

    *From:*ghc-devs <ghc-devs-boun...@haskell.org> *On Behalf Of
    *David Eichmann
    *Sent:* 27 March 2019 17:12
    *To:* Andrey Mokhov <andrey.mok...@newcastle.ac.uk>; Neil Mitchell
    <ndmitch...@gmail.com>
    *Cc:* GHC developers <ghc-devs@haskell.org>
    *Subject:* Re: Hadrian Transitive Dependencies

    Hello,

    To reiterate some definitions consider this scenario:

      * A.hs imports B.hs and B.hs imports C.hs
      * `ghc -M A.hs` reports that A.o depends on: A.hs, B.hi
      * `ghc -c A.hs` produces A.o and accesses A.hs, B.hi, and C.hi

    There seems to be some confusion about the term "Direct
    Dependency" I'll use these definitions:

    "Shallow Dependency": With respect to a haskell object file X.o,
    the shallow dependencies are the source file X.hs and interface
    files Y.hi for all modules Y imported by X.

      * These are the dependencies of X.o as reported by `ghc -M X.hs`
      * In the above scenario:

     *
          o A.o depends on: A.hs, B.hi

    "Deep Dependency": With respect to a haskell object file X.o, the
    deep dependencies are all hi files required by ghc to build X.o
    excluding direct dependencies:

      * This is a subset of modules transitively imported by X
      * These dependencies are NOT reported by `ghc -M X.hs`

    "Direct Dependency": if the command to create file X accesses file
    Y, then X directly depends on Y (= Y is a direct dependency of X).

      * In the above scenario:

     *
          o A.o directly depends on: A.hs, B.hi, and C.hi

      * SPJ noted that .hi files list direct dependencies.
      * The direct dependencies of a haskell object file is the union
        of its shallow and deep dependencies.

    "Direct Output": All files created by a rule.

    With that in mind, and considering a cloud build system where
    "*all direct inputs and direct outputs must be declared*" (where
    this agrees with the definitions above) can we do the following
    for the build rule of a haskell object file X.o?

     1. `need` the shallow dependencies as reported by `ghc -M`. This
        guarantees that all shallow and deep dependencies (i.e. all
        direct dependencies) are built.
     2. build X.o and X.hi
     3. Inspect X.hi to derive the direct dependencies (and hence deep
        dependencies)
     4. `needed` the deep dependencies

    Is there already an easy way to inspect *.hi files in this way? Is
    this use of `needed` valid?

    - David E

    On 3/27/19 3:05 PM, Andrey Mokhov wrote:

        Hi David,

        We had a discussion about this with Neil some time ago, and I
        think we had the following list of progressively more complex
        invariants for different types of build systems:

          * Non-cloud build systems: **all direct inputs must be
            declared**. If you miss a direct input dependency then a
            build may complete successfully but with an incorrect result.

          * Cloud build systems: **all direct inputs and direct
            outputs must be declared**. If you miss a direct output
            then a build may fail because the cloud will not be able
            to restore the corresponding output.

          * Cloud build systems with shallow (deferred)
            materialisation of build artefacts: **all transitive
            inputs and direct outputs must be declared**. Let’s say
            you’d like to download the resulting GHC binary directly,
            without materialising any intermediate artefacts. Then
            you’ll need to know GHC’s ultimate transitive inputs.

        I think for now we are really keen to make Hadrian a cloud
        build system, but whether shallow builds are valuable enough
        is not clear. Maybe not. Therefore, I’d say we don’t need to
        track transitive inputs right now. Furthermore, if we were to
        track all transitive inputs, we would lose the desirable early
        cutoff property, which prevents rebuilding after adding a
        comment in a file on which a lot of other files transitively
        depend on.

        Having said that, if we really access a file during
        compilation, then I think it is **not** a transitive
        dependency by definition! Any file which is accessed during a
        build rule is a direct dependency.

        > GHC is reading *.hi files that are not reported as dependencies by

        > `ghc -M -include-pkg-deps`. This is because they are not direct, but
        transitive

        > dependencies!

        So, here I’m confused. If we read a file A when compiling a
        file B, then it’s by definition a direct dependency. Perhaps
        we just read too much? Maybe the solution is to switch to
        fine-grained `ghc -M` mode, to analyse import dependencies for
        a single module instead of doing it transitively, which I
        believe was discussed in a ticket some time ago? I can’t find
        this ticket, but I think Alp was looking into it at some
        point. Alp: do you remember it?

        Thank you for all your work on Hadrian!

        Cheers,

        Andrey

        *From:*David Eichmann [mailto:dav...@well-typed.com]
        *Sent:* 27 March 2019 12:54
        *To:* Neil Mitchell <ndmitch...@gmail.com>
        <mailto:ndmitch...@gmail.com>; Andrey Mokhov
        <andrey.mok...@newcastle.ac.uk>
        <mailto:andrey.mok...@newcastle.ac.uk>; GHC developers
        <ghc-devs@haskell.org> <mailto:ghc-devs@haskell.org>
        *Subject:* Hadrian Transitive Dependencies

        Hello Shake/Hadrian contributors and the like,

        Recently I've been putting Hadrian's fsatrace linting feature
        to good use, tracking down missing dependencies in Hadrian.
        Ultimately, we want to use shake's cloud build / shared cache
        feature and ensure it works across CI builds. Unfortunately
        the feature isn't working smoothly with Hadrian: see #16295
        
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2Fissues%2F16295&data=02%7C01%7Csimonpj%40microsoft.com%7Cbd87a25e08f441fe763d08d6b2d76b25%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636893035618959820&sdata=BomxywLkHm7mriTubSnCql6YJDJBR96K1tQbskKBMn4%3D&reserved=0>.
        This is very desirable to improve CI build times. It is my
        understanding that in order to get caching to work:

        1. All accessed files must declared with `need` AND
        2. All created files must be declared with `produces` (or be
        the target of the build rule)

        Is my understanding correct? Or is there a weaker condition
        (perhaps only 2 is necessary)?

        If I'm correct, this amounts to fixing all fsatrace lint
        errors. See here
        
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.haskell.org%2Fghc%2Fghc%2Fissues%2F16400%23note_188901&data=02%7C01%7Csimonpj%40microsoft.com%7Cbd87a25e08f441fe763d08d6b2d76b25%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636893035618969820&sdata=Kmnd%2B8%2FATQBw0AfCTvvl7oix5syXbPAeV7h473t8H7E%3D&reserved=0>
        for a breakdown of lint errors / missing dependencies. A large
        portion of these are Haskell interface files (i.e. *.hi
        files). Before building a Haskell object file, dependencies
        are discovered via `ghc` using the `-M  -include-pkg-deps`
        options. Unfortunately, shake's fsatrace linting complains
        about other *.hi files being accessed! For example when
        building
        `stage1/libraries/mtl/build/Control/Monad/RWS/Class.o` we get
        the following dependencies from ghc:

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
libraries/mtl/Control/Monad/RWS/Class.hs

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/lib/../lib/x86_64-linux-ghc-8.9.20190325/base-4.13.0.0/Prelude.hi

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/lib/../lib/x86_64-linux-ghc-8.9.20190325/base-4.13.0.0/Data/Monoid.hi

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/lib/../lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/RWS/Strict.hi

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/lib/../lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/RWS/Lazy.hi

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/lib/../lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/Identity.hi

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/lib/../lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/Maybe.hi

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/lib/../lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/Except.hi

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/lib/../lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/Error.hi

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/lib/../lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/Class.hi

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/libraries/mtl/build/Control/Monad/Writer/Class.hi

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/libraries/mtl/build/Control/Monad/State/Class.hi

        _build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o : 
_build/stage1/libraries/mtl/build/Control/Monad/Reader/Class.hi

        And shake complains of the following missing deps:

        _build/stage0/bin/ghc -Wall -hisuf hi -osuf o -hcsuf hc -static 
-hide-all-packages -no-user-package-db '-package-db 
_build/stage1/lib/package.conf.d' '-this-unit-id mtl-2.2.2' '-package-id 
base-4.13.0.0' '-package-id transformers-0.5.5.0' -i 
-i_build/stage1/libraries/mtl/build -i_build/stage1/libraries/mtl/build/autogen 
-ilibraries/mtl/. -Iincludes -I_build/generated 
-I_build/stage1/libraries/mtl/build 
-I/home/david/ghc/_build/stage1/lib/x86_64-linux-ghc-8.9.20190325/base-4.13.0.0/include
 
-I/home/david/ghc/_build/stage1/lib/x86_64-linux-ghc-8.9.20190325/integer-gmp-1.0.2.0/include
 
-I/home/david/ghc/_build/stage1/lib/x86_64-linux-ghc-8.9.20190325/rts-1.0/include
 -I_build/generated -optc-I_build/generated -optP-include 
-optP_build/stage1/libraries/mtl/build/autogen/cabal_macros.h -outputdir 
_build/stage1/libraries/mtl/build -Wnoncanonical-monad-instances 
-optc-Werror=unused-but-set-variable -optc-Wno-error=inline -c 
libraries/mtl/Control/Monad/RWS/Class.hs -o 
_build/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o -O2 -H32m -Wall 
-fno-warn-unused-imports -fno-warn-warnings-deprecations -Wcompat 
-Wnoncanonical-monad-instances -Wnoncanonical-monadfail-instances -XHaskell2010 
-XSafe 
-ghcversion-file=/home/david/MEGA/File_Dump/Well-Typed/GHC/_nosync_git/ghc/_build/generated/ghcversion.h
 -Wno-deprecated-flags

        Lint checking error - 
_build/HEAD_default/stage1/libraries/mtl/build/Control/Monad/RWS/Class.o - 22 
values were used but not depended upon:

           Used:  _build/HEAD_default/stage0/lib/settings

           Used:  _build/HEAD_default/stage0/lib/platformConstants

           Used:  _build/HEAD_default/stage0/lib/llvm-targets

           Used:  _build/HEAD_default/stage0/lib/llvm-passes

           Used:  _build/HEAD_default/stage0/lib/package.conf.d/package.cache

           Used:  _build/HEAD_default/stage1/lib/package.conf.d/package.cache

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/base-4.13.0.0/GHC/Float.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/base-4.13.0.0/GHC/Base.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/ghc-prim-0.5.3/GHC/Types.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/base-4.13.0.0/GHC/Maybe.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/Writer/Lazy.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/Writer/Strict.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/State/Lazy.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/State/Strict.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/Reader.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/List.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/transformers-0.5.5.0/Control/Monad/Trans/Cont.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/ghc-prim-0.5.3/GHC/Tuple.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/base-4.13.0.0/GHC/IO/Exception.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/integer-gmp-1.0.2.0/GHC/Integer/Type.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/base-4.13.0.0/Data/Either.hi

           Used:  
_build/HEAD_default/stage1/lib/x86_64-linux-ghc-8.9.20190325/base-4.13.0.0/GHC/Natural.hi

        GHC is reading *.hi files that are not reported as
        dependencies by `ghc -M -include-pkg-deps`. This is because
        they are not direct, but /transitive/ dependencies! How do we
        fix these lint errors (again with the goal of using shakes
        shared cache feature)? Some ideas:

        * Wildly over approximate dependencies. This may be easier to
        implement but cause unneeded recompilation (when a false
        dependency changes). Either:
            * `need` all dependent packages' interface files
        recursively as well as transitive dependencies reported by
        `ghc -M -include-pkg-deps` within the current package. OR
            * OR `need` all transitive dependencies reported by `ghc
        -M  -include-pkg-deps`. This will likely result in fewer
        dependencies but requires a bit more work in recovering
        dependent packages' dependency graphs.
        * Perhaps transitive dependencies are not important for shared
        caching to work. Change shakes linting feature to allow
        (untracked?) transitive dependencies to be accessed.

        Feed back would be greatly appreciated.

        David Eichmann

--
        David Eichmann, Haskell Consultant

        Well-Typed LLP,http://www.well-typed.com  
<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.well-typed.com&data=02%7C01%7Csimonpj%40microsoft.com%7Cbd87a25e08f441fe763d08d6b2d76b25%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636893035618979828&sdata=ZUSCHtfKiOsTYbS%2Bo%2FAuhSuXwtM5TKnzVPubJjZd6RI%3D&reserved=0>

        Registered in England & Wales, OC335890

        118 Wymering Mansions, Wymering Road, London W9 2NF, England

--
    David Eichmann, Haskell Consultant

    Well-Typed LLP,http://www.well-typed.com  
<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.well-typed.com&data=02%7C01%7Csimonpj%40microsoft.com%7Cbd87a25e08f441fe763d08d6b2d76b25%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636893035618989837&sdata=QxSKBt9C1rO8AwS0C8%2FIPhK43naICuuJ8ebdIzaudnQ%3D&reserved=0>

    Registered in England & Wales, OC335890

    118 Wymering Mansions, Wymering Road, London W9 2NF, England


--
David Eichmann, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com

Registered in England & Wales, OC335890
118 Wymering Mansions, Wymering Road, London W9 2NF, England

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to