Re: Gut Build System

John D. Earle Fri, 05 Mar 2010 09:32:09 -0800

Simon Marlow wrote "I suggest that the way to start would be to design and 
build the 
infrastructure first, and then think about replacing GHC's build system."


Simon Marlow wrote "But if someone else were to do the work, and the result was 
maintainable and has at least the same functionality and performance, 
then it's a possibility." In a broad sense since all software needs to be 
maintained this is a desirable characteristic. Yes, this would be a good design 
goal.

I'm going to address this point for a moment and then comment on performance, 
then functionality. What factors influence whether something is maintained? 
What comes to my mind is things that are not vital tend to have this problem 
due to a lack of interest. That the build system succeeds by whatever means 
necessary is a vital interest. If it is adopted and does not prove to be worse 
than the previous way of doing things, it should stick. Good design and 
documentation help so does having a satisfactory feature set. Shell scripts and 
make files were once state of the art, cutting edge stuff. Today, it is no 
longer the cutting edge. There is no reason to believe that it won't be better 
with one possible exception, namely convenience and what sort of learning curve 
is involved both for the maintainers of the software and its consumers.

It is my impression that Haskell has a steep learning curve, but that is not 
altogether relevant when you are making an appeal to individuals who have 
already made the investment in Haskell. It might be a sticking point if we were 
to try to sell the idea to people who have little interest in making a large 
investment in becoming familiar with yet another language especially one that 
has a high learning curve. There are benefits to standardization regardless of 
the learning curve, however. To having one language instead of many. The Ada 
language is one such example.

I believe that Haskell is competitive as economy of key strokes is concerned. 
It just needs to have the same sort of convenient functionality that shell 
scripts possess, but this seems a matter of interest and motivation as opposed 
to whether or not it can be done. You can have the conventional C like 
function, low level interface that is a chore to use, but flexible, and a 
higher level interface where you pass string arguments that are in a form 
similar to what would appear on a command line. You might ask, would such a 
thing be efficient? That is where Template Haskell (TH) comes to the rescue. 
You can parse that string argument without it resulting in a run time penalty. 
As the build system is concerned I don't believe this is where the real 
performance gains will be achieved for reasons I will discuss shortly. There 
are other reasons to use TH.

It may be possible to use TH to emit standard complaint Haskell, that is 
Haskell without language extensions which could make the source code accessible 
to Haskell compilers that do not implement the sort of language extensions that 
GHC does. 

To discuss performance. It is doubtful that its performance would be less. It 
is likely there is a reason why few have bothered with speeding up the 
execution time of shell scripts and make files, however. The time isn't being 
spent executing the script. The time is being spent waiting for the file system 
to carry out a requested task. Consequently, it would not be shocking if no 
real gains are seen, but that presupposes that we kept with the old ways of 
doing things. 

I am proposing a new way of doing things. Shell scripts and make files were 
designed to make heavy use of the file system. Why? If you only had 4 kilobytes 
of RAM at your disposal, you work within your budget whether you like it or 
not. Instead of storing the data in a block of memory what do you do? You ship 
it out to a file. Allegedly, the file cache on the machine makes all of this 
irrelevant. In practice, it is merely an improvement. After running a few bench 
marks, you will discover that it is best to avoid accessing the file system 
whenever possible as opposed to as frequently as possible. This is easily 
demonstrated. 

How long will it take to copy a 100 megabyte file? Compare that with how long 
it will take to copy 100,000 files that are one kilobyte in size? The build 
system will copy 100,000 files and then do it a 100 times. When you realize 
this it begins to make sense why they call them nightly builds. If you are 
assuming that your RAM budget is tight, it makes sense. This is the way to do 
it. It works just like CPU registers. You have to push them onto the stack and 
off of the stack repeatedly. This is what traditional build systems do, but 
instead of pushing the values onto a stack. This is what makes it nonlinear. It 
is the assumption that the build system software is making concerning the 
amount of available RAM that is available. Today, we are far removed from this 
constraint. We can afford to begin thinking differently.

How the GHC executable and C language compiler executables work is consistent 
with the sort of model used by shell scripts and make files; consequently, I 
anticipate that some tweaking may be necessary in exactly how they work, but I 
am not entirely certain of this. One of my thoughts is to concatenate all the 
modules and feed them to the GHC or C executable as one large file or stream. 
This should result in dramatic improvements in speed as well as improvement in 
the quality of the compiler optimizations. When you do that, this is what you 
generally observe. The http://www.sqlite.org/ project does this. It may be 
convenient and desirable to ensure that GHC can accept a stream on standard in, 
for example. It may be further useful to modify the GHC source code so that GHC 
can remain resident in memory so there is no time wasted in calling it 
repeatedly like a CGI script.

As first things first is concerned what I want to do is exactly as Simon Marlow 
suggested, begin with the library, but I want to do more than this because I 
feel that would ultimately prove to be a mistake. The reason for this is that 
such an approach would encourage wrong thinking. We have to make something that 
solves the same problems as a shell script or make file, do so with a degree of 
convenience that is competitive, but also employ a different paradigm. What I 
intend to do is work first on creating a shell script, make file interpreter 
implemented in Haskell. That's phase one. We then stop using the sh and make 
executables and use the Haskell replacements. At this point the chief benefits 
from having done all of this will be the type safety and program analysis. You 
will know that a variable is unquoted and, consequently, the expression it 
appears in cannot cope with a space correctly. This alone will be useful. Many 
more errors will be caught early on with less effort. It will also mean that 
projects that have nothing to do with Haskell could benefit from it. It would 
be evangelical.

Phase two will be to supplant the shell scripts and make files altogether with 
Haskell. Here we will decide what is going to be the Haskell way.

Though I believe it will require a lot of work I believe that the investment to 
be a wise investment in that it will pay dividends. I am furthermore 
enthusiastic about it and have the time. I believe that this has intrinsic 
worth and something worth spending time on.

_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: Gut Build System

Reply via email to