RE: GHugsC?

trb Tue, 12 Oct 1999 19:22:10 +1000 (EST)
Simon Peyton-Jones writes:
 > * We also have a plan for how to modularise STG Hugs so that it
 >   consists of a re-useable C library called the Haskell Execution 
 >   Platform (HEP), plus a number of "clients".  Examples of clients are:
 >      - the normal Hugs textual interface
 >      - an updated version of WinHugs (any volunteers?)
 >      - a "scripting engine", that can Haskell-enable a web browser
 >      - a "haskell server" that enables you to wrap up Haskell source
 >        code as a COM object
 >      - and maybe more besides.
 > 
 >   The draft spec for the HEP is at
 >      http://www.haskell.org/ghc/docs/papers/hep.ps.gz

I've just read the paper.

It looks as if the HEP could form the seed of a functional operating system (one
can imagine the HEP staying up for long periods, with new modules and programs
run by it all the while). Knowing your crafty designs, you have probably had
this in mind already. In this sort of situation there are many things that could
be done, and not enough resources to do all of them. As you know well, a good
designer will implement the most important stuff first, and use foresight to
avoid ruling out other important developments later. In this vein I have a few
suggestions, in no particular order:

Reliability is not optional for an OS. There has been talk of how languages like
Haskell might not need memory-protection, since they can guarantee that illegal
accesses will not occur. In practice, people want to call C, and perhaps it is
necessary to have some form of memory protection. This would perhaps greatly
complicate matters. In the meantime, reliability might be considerably improved
by handling SIGSEGVs (presumably caused by calling C) and throwing an exception
that the HEP would handle at the top-level, rather than dying. Of course this
does not stop C from corrupting the HEP.

It would be very nice if the system could handle reflection. Clearly it is going
in this direction already, since a lot of the HEP operations are
meta-programming things. I don't understand very much about reflection, and
haven't understood how it would fit into the framework of pure strongly-typed
functional programming. Judging the many successes of FP, I speculate that FP
and reflection would go very nicely together. Have you come across the TUNES
project ( http://www.tunes.org/ ) ? Francois Rideau may get a little carried
away at times, but he makes the very telling point that much/most computer
programming activity occurs simply because of incompatibility between protocols
(e.g. data structures, APIs etc.). Reflective systems are supposed to overcome
this problem, I think by allowing different concrete representations and
supporting automatic conversion between them.

For a small example of the above incompatibilities, see the recent thread on the
Haskell list about reverse-composition. How the token appears to human readers
is not the most important point - what we really want is to for each function to
have (from birth) a name that is unique world-wide (this can be done by
exploiting the Domain Name System, as Java modules do). So long as the machines
understand the name, it can be presented to a programmer using whatever image he
feels comfortable with (e.g. using the Maverik GPL'ed virtual reality system -
http://aig.cs.man.ac.uk/systems/maverik/). But the representation should be
abstract. In order to do this well, we probably need reflection.

The way the STG machine uses info pointers seems very suitable for reflection
(the info pointer specifies how the data is to be interpreted). At least it
would be sensible to have some discussion with someone who really understands
reflection, and see what is possible.

The paper mentions that entries cannot be ejected from the HEP text table when a
module is deleted. This would be bad for a functional operating system - the
text table would expand relentlessly.

It would be nice if the new run-time system could support persistent data. This
might not be hard - I've thought about hacking up something like this from time
to time, and it seems a simple implementation would be little more than use of
mmap. Because the (memory-mapped) persistent store will be large, it may
be necessary to separate the persistent and transient heaps for GC purposes - we
only want to GC the persistent store occasionally. OTOH generational GC might do
the trick. For a quick hack, I thought of committing a persistent data structure
by means of a GC having the data structure(s) as its rootset, and using the
persistent store as the to-space so that (after this GC) the persistent store
will not contain any references to transient memory (this requires prohibiting
mutable variables in the persistent store, otherwise locations could be assigned
to point to transient memory). After such a commit it would be safe to unmap the
store. Closures would be okay if the GC also followed STG info pointers and
tables, so that the relevant code etc. was copied into the persistent store.

Persistent storage would be useful with things like the new
Haskell-typechecker-for-Haskell i.e. instead of ghc having to parse the source
again, the typechecker could hand it a persistent data-structure. Indeed, ghc
writes data-structures out to disk between optimisation passes and this slows it 
down a lot - that could be avoided too.

It would be nice if the new system had the potential to be network transparent
in some sense (not sure what the complications are here, but Manuel is working
on this sort of thing e.g. http://www.score.is.tsukuba.ac.jp/~chak/goffin/). In
particular it would be nice to be able to call a function in a
network-transparent (not a kludge like CORBA/DCOM) way e.g. have the arguments
and result marshalled across the network for the first call, and if calls
persist, have the function code cached locally. This sort of thing would be
great for open-source collaboration. Note that with CORBA etc. there is usually
a difference between moving the arguments & result and moving the function code,
but in a purely functional language these should be semantically equivalent. Of
course it would be nice to transparently share persistent data structures as
well...

It is a pity that the entire HEP will block on stdin etc. in RunHooks. I guess
the alternative (putting mutexes in the STG machine) is too expensive or too
much work. Maybe a functional OS could do non-blocking I/O in the main thread,
even if this is presented to the application level as blocking I/O. Even so, it
would be bad if the main thread died.

No doubt it has occurred to you that the HEP could transparently compile code
after it has been interpreted sufficiently many times to seem worth
compiling. The compiler for the Self language (a super-Smalltalk that runs at
half the speed of C) dynamically re-compiles code to optimise it for particular
patterns of execution (rather like doing ghc's SPECIALISE pragma on-the-fly).

Keep 64-bit machines in mind when messing around with hugs pointer details. The
remark about 16M names being enough sounds unpleasantly familiar, but by the
time this is not enough, 64-bit architectures should be dominant.  I hope the
new run-time will be able to work on Linux/Alpha (I know it was never considered
hard to port ghc to Linux/Alpha, hopefully this will continue to be so).

Tim
RE: GHugsC?

Reply via email to