Simon Marlow wrote:
> 
> > In my experience, it pays off to store as many as possible
> > of the intermediate files (and even the linked binary during
> > development) on a local disk (e.g., on /tmp).  This is
> > *much* faster than going via NFS.
> 
> Absolutely.
> 
> While I suspect that zipping all the .hi files might be a win over NFS, it
> might very well be a loss on a local disk.  And decent NFSv3 implementations
> (of which there is 1, as I recall :-) will do the caching properly so you
> only get the slowdown the first time.
I disagree.  The trouble is that filing system calls, whether global
or local, provide a lot more functionality than reading in zip files.
For example, a filing system has to be able to cope with multiple processes
reading and writing different files in a directory simultaneously.  My
recollection with MLj is that while you save quite a lot of time by putting
all files on local disk, you don't save as much as you do by writing
archive files.
> 
> I think the biggest win would come from dumping out the .hi files in some
> binary format which can be slurped straight back in again when ghc starts,
> avoiding the costly lexical analysis/parsing stages we go through now.  For
> this, we need a decent binary I/O library, though...
I don't think so.  As I said, when I run ghc/hsc the process is only using
50% of the available CPU time, and the only system calls it makes for
most of the time are accessing .hi files.  My instinct is always that
waiting for IO is far more expensive than CPU time.  I note that gzipped
.hi files are a lot smaller so it might well be cheaper to store the 
.hi files in a compressed zip archive.  

As a matter of fact I do agree that fast binary versions of show/read
would be a good idea, but I don't think it will help (much) with this problem.

I've now looked at the man-page describing the ar format and I think that in
fact the zip format would be better, because every zip file has a complete
directory at the end, so you normally only have to read in one disk block
for the directory, plus the disk blocks for the individual files.  It would
be tempting to make up one's own format (since there are many wasted bytes with
zip), but zip files have the overwhelming advantage that there are excellent
public domain tools around for manipulating them.

Reply via email to