Smalltalk performance and Moore's Law

Kragen Javier Sitaker Mon, 05 Mar 2007 00:37:05 -0800

Previous version posted at
http://lambda-the-ultimate.org/node/531#comment-23457 on 2006-12-25.


This is a partial rebuttal to Alan Kay's occasional assertion that
computers aren't nearly as much faster at executing late-bound things
like Smalltalk as you would expect from Moore's Law.

In an interview with ACM Queue, Kay writes [7]:

    Just as an aside, to give you an interesting benchmark --- on
    roughly the same system, roughly optimized the same way, a
    benchmark from 1979 at Xerox PARC runs only 50 times faster
    today. Moore’s law has given us somewhere between 40,000 and
    60,000 times improvement in that time. So there’s approximately a
    factor of 1,000 in efficiency that has been lost by bad CPU
    architectures.

But Moore's Law is about price-performance, not absolute performance;
here I estimate that the actual loss of price-performance attributable
to bad CPU architectures is perhaps a factor of 10 to 50, and it is
plausible that better compilers can remedy this.

Guesswork
=========

"Resuna" writes [6]:

    The [VAX] 11/780 was 3.6 MHz, 32-bit words. I don't know how fast
    the Alto or Dorado were, but with the Dorado being the
    archetypical "3M" machine I assume its performance was comparable
    to a nominally 1-MIPS 11/780.

According to Wikipedia [0], the Dorado was an all-ECL machine.  The
abstract to Lampson and Pier's paper on the Dorado [1], which I
haven't read, says it ran at 20MHz, had 16 hardware threads to provide
zero-context task switching, and was built out of "approximately 3000
MSI [ECL] components".  So it was considerably faster than a VAX.
Maybe one of the older D-machines is "the archetypal 3M-machine".

Apparently it could run 200k-400k Smalltalk bytecodes per second [2].
I'm guessing that the Dorado is the particular machine Kay was
alluding to benchmarking, since it was introduced in 1979, and the
context of the conversation is how machines designed to be efficient
at high-level language execution were worthwhile.

I don't think it was ever sold commercially (or even mass-produced
in-house), which makes per-unit costs difficult to calculate.
However, if we assume that each of the 3000 chips in the thing cost
$20 each (unfortunately I have no real idea how much ECL chips cost in
1980), that's a $60 000 bill-of-materials cost.  So it might have cost
$100 000 per machine if it had been mass-produced, and since it was
ECL, the electrical power cost of running it would likely be higher
per chip as well.

According to the squeak-dev thread on the subject [3], modern 600MHz
uniprocessors are about 20x the speed of the Dorado when running
Squeak, or 35 million bytecodes per second (which sounds more like
100x the speed of the Dorado, actually).

However, the uniprocessors in question cost US$150 or so, which is
inflation-equivalent to maybe US$75 in 1980 dollars.  (They also
include hundreds of megabytes of RAM, instead of the 8MB on the
Dorado.)

If you were going to spend $100 000 today (or when Kay gave this
interview) on a computer to run Smalltalk on, you would probably get a
Beowulf of 50 nodes, each node of which could run bytecodes at 50 to
200 times the speed of a Dorado, and that's running Squeak, which is
not designed to be a particularly high-performance Smalltalk.  But
Moore's Law has still given us, by my rough estimates, a factor of
2500 to 10 000 in price/performance in this case.  (That's not
counting the difference between 8 megs of RAM and 50 000 megs of RAM,
or the advantage of having 10TB of disk, etc.)  A factor of 2500 is
still noticeably less than the 131072x improvement that you might
predict from a naive application of Moore's law, but the remaining
factor of 10-50 is probably explicable in terms of Kay's explanation:
the architecture is not optimized for Smalltalk bytecode execution, so
you get a 10-50x slowdown when you use it as if it were a Dorado.

(You might be able to get a Beowulf of 300 nodes at that price,
depending on other circumstances.)

How much faster are other Smalltalk implementations than Squeak?
Various microbenchmarks seem to peg Strongtalk at 3x-10x faster than
Squeak (Avi Bryant's [4], David Griswold/Klaus Witzel's [5]), which
would nicely compensate for the remainder of Kay's complaint.

References
==========

[0] Wikipedia article "Xerox Alto", section "Diffusion and Evolution",
as of 2006-12-25
> http://en.wikipedia.org/wiki/Xerox_Alto#Diffusion_and_evolution

[1] "A Processor for a High-Performance Personal Computer", from
Butler W. Lampson and Kenneth A. Pier, Xerox PARC, 1980, IEEE
"CH1494-4/80/0000-0146" (whatever that means), 15 pp.; mentions, among
other things, that the first machine "came up in the spring of 1979".
> http://research.microsoft.com/Lampson/24-DoradoProcessor/Acrobat.pdf

[2] Squeak-dev post "Dorado bytecodes per second", from Bruce ONeel
(edoneel at sdf.lonestar.org), 2005-05-28T16:41:49 CEST, quoting
previous post from Jecel Assumpcao Jr (jecel at merlintec.com):

    By running the benchmarks for the "green book" and doing a lot of rough
    extrapolations, my guess is that the Dorado would get between 200K and
    400K bytecodes/sec.

And followup from Tim Rowledge (tim at rowledge.org):

    That is pretty much what I remember as the claim for Dorados.

> http://lists.squeakfoundation.org/pipermail/squeak-dev/2005-April/091211.html

[3] Squeak-dev post "Dorado bytecodes per second", from Jecel
Assumpcao Jr (jecel at merlintec.com), 2005-05-28T22:38:19 CEST ---
he's talking about 600MHz ARMs.
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2005-April/091215.html

[4] Blog post "Ruby and Strongtalk II", by Avi Bryant, on his blog
"HREF Considered Harmful"; the microbenchmark in question did a
billion accesses of a thousand-element array of small integers, took
0.7 seconds in Java, 7 seconds in Strongtalk, 70 seconds in Squeak, or
16 if you use Array instead of ByteArray.
> http://smallthought.com/avi/?p=17

[5] Squeak-dev thread "Thue-Morse and performance: Squeak
v.s. Strongtalk v.s. VisualWorks", started by Klaus D. Witzel
2006-12-17; several people, including David Griswold, point out flaws
in Witzel's initial benchmark, and the results are interesting.
> http://www.nabble.com/Thue-Morse-and-performance:-Squeak-v.s.-Strongtalk-v.s.-VisualWorks-t2834773.html

[6] Comment "I still want to see Kay's benchmark...", from "Resuna",
2005-07-22
> http://lambda-the-ultimate.org/node/531#comment-7895

[7] ACM Queue article "A Conversation with Alan Kay: Big Talk with the
creator of Smalltalk --- and much more.", by Stuart Feldman and Alan
Kay, vol. 2, no. 9, Dec/Jan 2004-2005, is the origin of this quote.
> http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=273&page=3

Smalltalk performance and Moore's Law

Reply via email to