Simon Marlow wrote "I suggest that the way to start would be to design and
build the
infrastructure first, and then think about replacing GHC's build system."
Simon Marlow wrote "But if someone else were to do the work, and the result was
maintainable and has at least the same functionality and performance,
then it's a possibility." In a broad sense since all software needs to be
maintained this is a desirable characteristic. Yes, this would be a good design
goal.
I'm going to address this point for a moment and then comment on performance,
then functionality. What factors influence whether something is maintained?
What comes to my mind is things that are not vital tend to have this problem
due to a lack of interest. That the build system succeeds by whatever means
necessary is a vital interest. If it is adopted and does not prove to be worse
than the previous way of doing things, it should stick. Good design and
documentation help so does having a satisfactory feature set. Shell scripts and
make files were once state of the art, cutting edge stuff. Today, it is no
longer the cutting edge. There is no reason to believe that it won't be better
with one possible exception, namely convenience and what sort of learning curve
is involved both for the maintainers of the software and its consumers.
It is my impression that Haskell has a steep learning curve, but that is not
altogether relevant when you are making an appeal to individuals who have
already made the investment in Haskell. It might be a sticking point if we were
to try to sell the idea to people who have little interest in making a large
investment in becoming familiar with yet another language especially one that
has a high learning curve. There are benefits to standardization regardless of
the learning curve, however. To having one language instead of many. The Ada
language is one such example.
I believe that Haskell is competitive as economy of key strokes is concerned.
It just needs to have the same sort of convenient functionality that shell
scripts possess, but this seems a matter of interest and motivation as opposed
to whether or not it can be done. You can have the conventional C like
function, low level interface that is a chore to use, but flexible, and a
higher level interface where you pass string arguments that are in a form
similar to what would appear on a command line. You might ask, would such a
thing be efficient? That is where Template Haskell (TH) comes to the rescue.
You can parse that string argument without it resulting in a run time penalty.
As the build system is concerned I don't believe this is where the real
performance gains will be achieved for reasons I will discuss shortly. There
are other reasons to use TH.
It may be possible to use TH to emit standard complaint Haskell, that is
Haskell without language extensions which could make the source code accessible
to Haskell compilers that do not implement the sort of language extensions that
GHC does.
To discuss performance. It is doubtful that its performance would be less. It
is likely there is a reason why few have bothered with speeding up the
execution time of shell scripts and make files, however. The time isn't being
spent executing the script. The time is being spent waiting for the file system
to carry out a requested task. Consequently, it would not be shocking if no
real gains are seen, but that presupposes that we kept with the old ways of
doing things.
I am proposing a new way of doing things. Shell scripts and make files were
designed to make heavy use of the file system. Why? If you only had 4 kilobytes
of RAM at your disposal, you work within your budget whether you like it or
not. Instead of storing the data in a block of memory what do you do? You ship
it out to a file. Allegedly, the file cache on the machine makes all of this
irrelevant. In practice, it is merely an improvement. After running a few bench
marks, you will discover that it is best to avoid accessing the file system
whenever possible as opposed to as frequently as possible. This is easily
demonstrated.
How long will it take to copy a 100 megabyte file? Compare that with how long
it will take to copy 100,000 files that are one kilobyte in size? The build
system will copy 100,000 files and then do it a 100 times. When you realize
this it begins to make sense why they call them nightly builds. If you are
assuming that your RAM budget is tight, it makes sense. This is the way to do
it. It works just like CPU registers. You have to push them onto the stack and
off of the stack repeatedly. This is what traditional build systems do, but
instead of pushing the values onto a stack. This is what makes it nonlinear. It
is the assumption that the build system software is making concerning the
amount of available RAM that is available. Today, we are far removed from this
constraint. We can afford to begin thinking differently.
How the GHC executable and C language compiler executables work is consistent
with the sort of model used by shell scripts and make files; consequently, I
anticipate that some tweaking may be necessary in exactly how they work, but I
am not entirely certain of this. One of my thoughts is to concatenate all the
modules and feed them to the GHC or C executable as one large file or stream.
This should result in dramatic improvements in speed as well as improvement in
the quality of the compiler optimizations. When you do that, this is what you
generally observe. The http://www.sqlite.org/ project does this. It may be
convenient and desirable to ensure that GHC can accept a stream on standard in,
for example. It may be further useful to modify the GHC source code so that GHC
can remain resident in memory so there is no time wasted in calling it
repeatedly like a CGI script.
As first things first is concerned what I want to do is exactly as Simon Marlow
suggested, begin with the library, but I want to do more than this because I
feel that would ultimately prove to be a mistake. The reason for this is that
such an approach would encourage wrong thinking. We have to make something that
solves the same problems as a shell script or make file, do so with a degree of
convenience that is competitive, but also employ a different paradigm. What I
intend to do is work first on creating a shell script, make file interpreter
implemented in Haskell. That's phase one. We then stop using the sh and make
executables and use the Haskell replacements. At this point the chief benefits
from having done all of this will be the type safety and program analysis. You
will know that a variable is unquoted and, consequently, the expression it
appears in cannot cope with a space correctly. This alone will be useful. Many
more errors will be caught early on with less effort. It will also mean that
projects that have nothing to do with Haskell could benefit from it. It would
be evangelical.
Phase two will be to supplant the shell scripts and make files altogether with
Haskell. Here we will decide what is going to be the Haskell way.
Though I believe it will require a lot of work I believe that the investment to
be a wise investment in that it will pay dividends. I am furthermore
enthusiastic about it and have the time. I believe that this has intrinsic
worth and something worth spending time on.
_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc