| I want to say something about a nasty bug that has been in Hugs ever since
| I started using it. Everyone knows it is there, but it is not in the bug
| report!

The implementation of Hugs uses a heap data structure to represent a
wide range of values both at compile-time and at run-time.  If, for
whatever reason, the heap is corrupted, then Hugs is likely to behave
in an erratic fashion, with unexpected and unexplained results, or, at
best, detecting the problem and producing an "Error in graph" message.
Problems of this kind have been reported with both Hugs and Gofer, but 
have always proved very difficult to correct because they are so hard
to reproduce, and because the effects of heap corruption may go
undetected for quite a long time.  The only safe way to clear up after
the heap has been corrupted is to exit and restart the interpreter.
Interestingly, the effect to which such problems occur seems to depend
on the individual concerned, probably as a result of the way that they
use the interpreter.  It might even be influenced by the fact that
differences in programming style will exercise some parts of the system
more than others.

The problems are almost certainly the result of interactions between
the main parts of Hugs and its garbage collector.  The garbage
collector is supposed to avoid making changes to any parts of the heap
that are being used elsewhere in the system.  My guess is that, under
some circumstances, the garbage collector overwrites data that it
should not have touched, and that this eventually leads to corruption.
There are two ways that this could happen.  The first would be some
kind of algorithmic problem, such as forgetting to inform the garbage
collector about a particular section of heap that needs to be
retained.  The second would be the result of conflicts between the
assumptions on which the garbage collector is built, and with the
workings of the C compiler used to build Hugs.  We have always hoped
that only problems of the first kind would occur, because these are
usually easy to reproduce, to track down, and to fix.  But we have
always suspected that at least some of the problems are of the second
kind.  For example, on some machines, compiling without optimization  
has a significant effect on reducing such problems.  And the problems
seem to occur more on machines with lots of registers, whose compilers
are perhaps more inclined to squirrel away local variables in places
that cannot be seen by the garbage collector.

A couple of months ago, somebody discovered a problem of the first
kind that might have been responsible for the heap corruption problems,
and they posted their findings on this mailing list.  (Please accept
my apologies now for not remembering who that was!)  While sceptical
about the chances of this eliminating *all* of our problems, it did
seem to help quite a bit, and no further problems were reported until
now.

Thanks to Koen for letting us know that the problem still exists.
We weren't sure if was still there, but now we know.  In the future,
we're hoping to separate out the compile-time and run-time heaps,
and that might make a big difference.  But, in the meantime, can
I repeat the appeal I made last time this problem came up: if
anyone finds a way to reproduce this kind of bad behaviour in a
fairly reliable way, then please let us know!

All the best,
Mark

Reply via email to