Bytecode interpreters can offer unsurpassed code compactness, ...[This
box has] only 2MiB of L2 cache, and presumably something like
16-64KiB of L1 cache. Thrashing the cache is soundly punished.
One problem, as you point out earlier, is that the question is not size of
native code loop vs. size of bytecoded loop, but size of native loop vs.
size of (bytecoded loop + its working path through the bytecode
interpreter).
I haven't used MSVC in ages, but their compiler used to have the option to
compile to a bytecode -- still, I think this was only ever useful for
space, not for speed. (for that, one must try to arrange the
functionality to ensure only the outermost loops are being interpreted)
I know of no other reason to implement an interpreter using bytecode.
So I'm surprised it's such a popular thing to do! I think the reason
is probably that code space and compilation time used to be quite
precious resources (not to mention portability), and programmers just
haven't adjusted to the new realities.
Architecture Neutral Debugging Format.
Generating native code is not that difficult -- at least it doesn't take
much to generate code that will beat an interpreter. Debugging native
code, however, can be a real pain (one either has to understand the
debugging facilities of both the processor and the OS[0], or one has to
understand the debugger's favorite symbol format and provide a decent map
to it -- for bass, I was able to provide some simple gdb macros as long as
the only register used was a TOS cache, but that broke immediately after
providing register allocation)
Bytecode allows the debugger to control execution[1], and while bytecode
constructs still have to be mapped back to source, at least the language
developer (having done the mapping in the other direction) has an easier,
single, job than when going all the way down to the iron, where a more
difficult job must be repeated for each platform.
-Dave
:: :: ::
[0] it is somewhat instructive to look at DEBUG.COM (an artifact of 1980's
era PCs still shipped with Redmond OS's to this day) for an example of how
spartan an IDE can be. Usually one wants an assembler to remember labels,
and a debugger to remember breakpoints. DEBUG.COM will turn opcodes into
machine language, but you're on your own for labels (leading to a style in
which one tries to minimize entry points, turning code into a collection
of "loops with tails" and requiring utilities similar to BASIC's RENUM)
Similarly, DEBUG.COM will handle all the machine level hassle of setting
breakpoints in code, taking the interrupt, and then restoring the original
code -- but you type in the list of breakpoints at each step. Once upon a
time, people actually developed apps with these bear skins and stone
knives.
[1] bytecode also makes "eval" almost as trivial as in assembly. Too much
dynamism can get one in trouble, though. Consider the "house of cards"
cartoon on p.112 of the Smalltalk green book: "here, let me modify Array
At:"
http://www.iam.unibe.ch/~ducasse/FreeBooks/BitsOfHistory/BitsOfHistory.pdf
In that situation, bytecode can be the difference between hosing a process
and triple-faulting a box.
The OCaml version is 24 instructions, 8 of which have immediate
constants. I don't know very much about PowerPC assembly, but let's
suppose that every instruction is 32 bits, including any immediate
constants; that means the whole function weighs 96 bytes.
Using gas or "objdump -D -b binary" would help if you want more accurate
numbers. If I remember properly, you're correct about instructions, but
not necessarily immediates.
[the MuP21] executes a stream of 5-bit zero-operand two-stack
operations packed into 20-bit words.
For more along these lines, see http://www.jwdt.com/~paysan/b16.html
But these sort of games are better played in hardware than in software --
hardware is great at parallel things (instruction decoding) and pretty
weak at serial things, while software is the opposite. (this was a
traditional reason for bytecode -- to minimize work for dispatch) FSMs
seem to be where the two meet: to add behavior to hardware (think
microcode) or to add parallel-pattern-matching to software (think
parsers), one tends to wind up synthesizing state machines.