At the time of writing, I am waiting for yet another long Haskell re-make of
lots of modules to complete.  The frustrating thing is, that at least 90% of these
remakes are actually completely unnecessary.  Suppose module A imports module B
and B imports C.  So in my automatically generated Make file I will have lines like
A.o : B.hi
B.o : C.hi

I've attached 3 trivial such files.  If you do
ghc C.hs -c
ghc B.hs -c
ghc A.hs -c
you'll see that you now have three .hi files.  Each of these starts with
a line (in the case of C) like
   __interface C 1  406 where
The second number is a GHC version and isn't relevant here.  The first number
is the "interface file version". 

Furthermore, in files that import another file, you'll see a line like (for B):
import C 1 ::;

Supposing you now make a change to C.  For example, suppose you add a line
   pangle = "pangle"
to the end of C, and add "pangle" to the list of exports from C.
Then you recompile C.  You will find that C.hi now has a new version number
"2".  Also of course the values which are also listed in the .hi file are
changed. 

Now recompile B.  You will then find that even though the interface to B
is precisely the same, the version number attached to B.hi is bumped up
by one.  Thus B.hi is updated and you then get a completely unnecessary
recompilation of A, since A.o depends on B.hi.

So why does GHC do this?  It's explained a bit in section 3.7.4 of the
GHC User Manual.  The problem is instance declarations.  If instead of
adding a new value "pangle" to C's export list, you'd added a new instance
declaration in C, then A's environment would indeed have changed and you'd
have to recompile it.  So GHC seems to bump up the .hi version number whenever
anything in the interface file changes. 

There is another less serious problem.  Suppose that B had imported C only
to export it again (either by listing it in the export list, or not
having any export list at all).  However I think the best way to deal
with this is simply to copy over all the types and other information
from C's .hi file to B's.  (Other information includes, for example,
"To get this value B.whatsit you really need to link to the symbol C_whatsit"
as well as regurgitated code for inlined functions.)  
To avoid explosion we should avoid quoting the same symbol twice.
  
Thus .hi files will contain a complete description of the interface, 
except for indirect information about the instances.  .hi files should 
also be structured in four parts, as follows:
(1) the header (including the .hi file version number
(2) directly imported modules (listing their version numbers)
(3) the instance declarations
(4) type and value declarations.
Since (4) includes all types and value declarations exported, it could
get rather large, if the user adopts the bad programming practice of
allowing interfaces to explode by exporting all but the kitchen sink.
But GHC only has to parse it when the module is directly included in
a file.  In other cases it should not be parsed.

There are now two algorithms to describe: 
(A) Writing out the .hi file at the end of compilation
    The only tricky thing here is whether or not we bump up the version number.
    I say we only bump up the version number when either the version number of
    a directly imported module has changed, or we have altered the imported modules,
    or we have altered the instance declarations.  So if only information in (4) has
    changed, we don't touch the version number.  As before, if the resulting .hi
    file would be identical to the last one (meaning that the whole interface is
    identical), we don't actually touch the .hi file at all.
(B) The recompilation checker.
    (Actually this is just doing what Make does, and so isn't really needed anyway.)
    Don't recompile if the source file and all directly imported .hi files are
    older than .o or .hi.
(C) Reading the imports at the start of a compilation.
    Read all .hi files for directly imported files.  Read (1)-(3) for all
    indirectly imported files.

This scheme is not the cleverest that could be devised.  For example it is
still necessary to recompile whole chains of modules if you add an import
declaration.  (Not to a system library, imports from those are counted as
"stable" in GHCs and my system.)  To fix this you would need to keep more
information in the .hi file.  But I still think my scheme would save me a lot
of GHC compilations.

A.hs

B.hs

C.hs

Reply via email to