Re: OCaml vs. SBCL, and various other interpreters

Kragen Javier Sitaker Tue, 20 Mar 2007 10:11:15 -0800

Dave Long writes:
> > Bytecode interpreters can offer unsurpassed code compactness, ...[This  
> > box has] only 2MiB of L2 cache, and presumably something like
> > 16-64KiB of L1 cache.  Thrashing the cache is soundly punished.
> 
> One problem, as you point out earlier, is that the question is not
> size of native code loop vs. size of bytecoded loop, but size of
> native loop vs.  size of (bytecoded loop + its working path through
> the bytecode interpreter).
> 
> I haven't used MSVC in ages, but their compiler used to have the
> option to compile to a bytecode -- still, I think this was only ever
> useful for space, not for speed.  (for that, one must try to arrange
> the functionality to ensure only the outermost loops are being
> interpreted)


The way I remember Joel Spolsky telling the story, when he was on the
Excel team, they maintained their own C compiler specifically in order
to be able to compile to bytecode --- which was crucial to their
ability to compete on the memory-constrained machines of the early
1990s.  (I think this period may have included the RAM bubble, where
memory stayed at $40 a meg for four years.)  Maybe the feature was
integrated into MSVC so the Excel team wouldn't have to maintain their
own C compiler any more.

I have a vague memory that some of the Symbolics folks said that when
they reimplemented their 36-bit CPU as an Alpha AXP program
("OpenGenera"), it fit in the L1 cache of the Alpha, and it ran Lisp
programs about as fast as you'd expect microcode to run on a machine
with that kind of clock speed.

> > I know of no other reason to implement an interpreter using bytecode.
> >
> > So I'm surprised it's such a popular thing to do!  I think the reason
> > is probably that code space and compilation time used to be quite
> > precious resources (not to mention portability), and programmers just
> > haven't adjusted to the new realities.
> 
> Architecture Neutral Debugging Format.

Good point!

> Generating native code is not that difficult -- at least it doesn't
> take much to generate code that will beat an interpreter.  Debugging
> native code, however, can be a real pain (one either has to
> understand the debugging facilities of both the processor and the
> OS[0], or one has to understand the debugger's favorite symbol
> format and provide a decent map to it -- for bass, I was able to
> provide some simple gdb macros as long as the only register used was
> a TOS cache, but that broke immediately after providing register
> allocation)

Maybe this accounts, in part, for the great interest in static
correctness proofs in the functional programming community?  OCaml has
a reportedly very nice debugger that works only with its bytecode, not
with its native code, and they simply try very hard to ensure that
they have the same behavior so you never have to debug the native
code.  Having a static type system that permits type erasure without
loss of safety helps with this.

It's too bad gdb hasn't acquired a decent scripting language, all
these years after the Plan9 paper (was it "Acid: a debugger built from
a language"?)

If I remember correctly, bass started out using ColorForth as a basis?
I'm afraid it's been a number of years since I looked at it.  Is there
a place to see it these days?

> [0] it is somewhat instructive to look at DEBUG.COM (an artifact of
> 1980's era PCs still shipped with Redmond OS's to this day) for an
> example of how spartan an IDE can be.

There's the apocryphal story of Leonard Zubkoff implementing a
debugger in a weekend for a new platform, rather than waiting weeks
for someone else to do the job; I wonder how much more or less his
debugger contained than DEBUG.COM.

> Usually one wants an assembler to remember labels, and a debugger to
> remember breakpoints.  DEBUG.COM will turn opcodes into machine
> language, but you're on your own for labels (leading to a style in
> which one tries to minimize entry points, turning code into a
> collection of "loops with tails" and requiring utilities similar to
> BASIC's RENUM)

You can avoid needing RENUM by patching in your additions with jump
instructions, exacerbating the label problem.

What do you mean, "loops with tails"?  You mean that you don't tend to
factor the code into a lot of small functions?

I wonder why DEBUG couldn't rememember labels or breakpoints.
FIG-Forth ran on the same hardware and remembered the names of every
routine in the system.  Maybe DEBUG had to fit into a tiny static
corner of a 64K memory segment, to avoid limiting its applicability to
small programs?  But it isn't restricted to disassembling or debugging
tiny-model programs.

> Similarly, DEBUG.COM will handle all the machine level hassle of
> setting breakpoints in code, taking the interrupt, and then
> restoring the original code -- but you type in the list of
> breakpoints at each step.  Once upon a time, people actually
> developed apps with these bear skins and stone knives.

An acquaintance entered an ACM programming competition, back in the
early 1990s; the programming tools available on the competition
machine included Pascal compilers, but no compiler for any language he
knew well.  So he tried creating his entries in DEBUG, and he seemed
to be doing OK for a while, but there was a problem when the judges
asked to see the source code.  They were not impressed when he
disassembled the object code.

> [1] bytecode also makes "eval" almost as trivial as in assembly.  Too much  
> dynamism can get one in trouble, though.  Consider the "house of cards"  
> cartoon on p.112 of the Smalltalk green book: "here, let me modify Array  
> At:"
> http://www.iam.unibe.ch/~ducasse/FreeBooks/BitsOfHistory/BitsOfHistory.pdf

Added to my list of things to download next time I get online.

> In that situation, bytecode can be the difference between hosing a
> process and triple-faulting a box.

Have you been following Ian Piumarta's Pepsi/Coke/Jolt/Albert work?
He's very much focused on questions like how to allow you to modify
Array>>#at: at runtime without breaking everything.  (Of course, if
your modification is broken, you will still have problems...)

    http://piumarta.com/pepsi/objmodel.pdf  (the most comprehensible)
    http://piumarta.com/papers/colas-whitepaper.pdf
    http://stanford-online.stanford.edu/courses/ee380/070214-ee380-300.asx
    (I haven't seen that one yet)

There are some more details and some broader context in

    http://www.vpri.org/html/work/NSFproposal.pdf
    http://www.peerbox.com:8668/space/Albert
    http://irbseminars.intel-research.net/AlanKay.wmv

I'm not sure that the difference between hosing a process and crashing
a box (triple-faulting?) necessarily lies in bytecode versus native
code; if the Array>>#at: implementation is shared among a lot of
different processes, the machine is still going to be fairly useless
if you break it, while if each process has its own memory space and
(at least potentially) its own Array>>#at:, recovery may be easier.

The Art of the Metaobject Protocol, which Andrew Cooke was kind enough
to lend me when we visited him last month, talks a bit about the
metastability issues involved in implementing CLOS within CLOS ---
mostly ensuring that various recursions have a base case.  I think Ian
Piumarta's approach to the problem looks simpler, though.

> > The OCaml version is 24 instructions, 8 of which have immediate
> > constants.  I don't know very much about PowerPC assembly, but let's
> > suppose that every instruction is 32 bits, including any immediate
> > constants; that means the whole function weighs 96 bytes.
> 
> Using gas or "objdump -D -b binary" would help if you want more accurate  
> numbers.  If I remember properly, you're correct about instructions, but  
> not necessarily immediates.

I don't know where to get gas or objdump for this Mac without four
gigabytes of other development environment cruft, which is part of why
I've been using OCaml.

> > [the MuP21] executes a stream of 5-bit zero-operand two-stack
> > operations packed into 20-bit words.
> 
> For more along these lines, see http://www.jwdt.com/~paysan/b16.html

That's a fascinating page!  In the b16-small sources listed there,
there's a LyX paper for EuroForth 2004, which seems to mention a new
Forth core from Chuck Moore called the "c18", described in a paper
cited as "c18 ColorForth Compiler, Chuck Moore, 17th EuroForth
Conference Proceedings, 2001".

I also think that's the first 8-page conference paper I've ever seen
that includes a complete and working (?) synthesizable CPU design (in
noweb and Verilog) in the text of the paper.  (It is synthesizable,
isn't it?  My Verilog is not up to snuff.  The actual Verilog seems to
be about 240 lines.)

It's too bad that he doesn't report on how much FPGA space it needed
or how fast it ran.

> But these sort of games are better played in hardware than in
> software -- hardware is great at parallel things (instruction
> decoding) and pretty weak at serial things, while software is the
> opposite.  (this was a traditional reason for bytecode -- to
> minimize work for dispatch) FSMs seem to be where the two meet: to
> add behavior to hardware (think microcode) or to add
> parallel-pattern-matching to software (think parsers), one tends to
> wind up synthesizing state machines.

That sounds like a pretty fundamental insight.

I wonder if the Symbolics microcode was horizontal?  If they used
vertical microcode while the Dorado was horizontal, maybe that might
explain the difference between their perception of Moore's Law and
Alan Kay's --- Kay consistently complains that modern CPU
architectures run Smalltalk and other late-bound things slowly, which
is sort of the opposite of the apocryphal Symbolics comment I cited
earlier.

Re: OCaml vs. SBCL, and various other interpreters

Reply via email to