Re: [fonc] Alto-2?

Merik Voswinkel Thu, 26 May 2011 19:26:25 -0700

"A couple of years ago we started this project called Squeak, which issimply not an attempt to give the world a free Smalltalk, but anattempt to give the world a bootstrapping mechanism for something muchbetter than Smalltalk, and when you fool around with Squeak, please,please, think of it from that standpoint. Think of how you canobsolete the damn thing by using its own mechanisms for getting thenext version of itself." - Alan Kay, The Computer Revolution Hasn'tHappened Yet, October 7, 1997, OOPSLA'97 Keynote.


On May 26, 2011, at 1:57 AM, Max OrHai wrote:

Have you looked at Jecel Assumpcao's SiliconSqueak? An awful lot canbe done on the cheap with modern FPGAs, so long as you don't straytoo far from the conventional CPU design space...


On May 26, 2011, at 2:46 AM, Casey Ransberger wrote:

Thanks for recommending Silicon Squeak. Jecel's project is soawesome! And while I totally can't wait to have one:) I think what Ican do this year will likely be limited to integrating off the shelfparts. That said, I'm hoping I can create something interesting evenwith those constraints. I've bounced email back and forth withJecel, and I really like his point of view:)

Allow me to describe a few key features of our SiliconSqueak researchhere, as most papers have not been published yet.

SiliconSqueak, like the Xerox Alto, is a microcode processor. It hasmany 32 bit cores, each with a stack and data cache, a minimal 5 stageinstruction pipeline and ring networks. In the microcode we implementthe Squeak bytecodes so the processor can behave like the softwareSqueak Virtual Machine running standard Squeak images bit-identical.Because we optimized the design for Squeak bytecodes, message sendsare particularly efficient. The use of microcode allows many otherbytecode or virtual machine systems to be implemented easily,including Lisp, Python, Frank (as a target assembly/bytecode or eventhe stack oriented abstract machine itself). The Worlds mechanismcould also be implemented in microcode to achieve a significant speedup.Although you can emulate almost any system in microcode (as it isturing-complete), the further you go from the Squeak bytecode model,the less efficient it might get. I guess a C compiler targeting themicrocode could be more than 4 times slower than on a processoroptimized to run C.

A typical SiliconSqueak FPGA or ASIC has a number of cores connectedwith multiple ring networks to other cores, memory and high speedslinks, ranging from 4 x 3,1 Gbps to 88 x 28 Gbps. I think of it as aroomful of Alto's on a chip. Sending a message to an object (in Squeakimplemented with bytecodes) can be handled in many different ways bythe microcode and underlying hardware ring network. A message send canbe sent to any object anywhere. It can be in the cache, in the localobject heap, in another core or in cores reachable through theexternal links to neighboring SiliconSqueak units and routed by thehardware until it reaches the location of the object. If the object isexternal to interconnected the clusters of SiliconSqueaks forming asupercomputer, it can also be handed off to Smalltalk code running inthe image, that can then send it as IP packets over the internet to aremote Squeak image (that itself may be running on a manycoreSiliconSqueak or software implemented Squeak VM).A single Squeak image can at runtime distribute its objects among thememories of all cores of all SiliconSqueak processors. A message sendsto an remote object would run code on the remote core, achievingparallelism that can be transparent to the Squeak programmer. However,it can also be utilized for explicit forms of parallelism. Byextending the functionality of the message send behavior different(parallel) programming models can be accommodated (for example futuresends as in Actors).Other auto-tuning algorithms can (transparently) redistribute objectsamong the cores to implement other ways to exploit fine or coursegrained parallelism. It is also an option to have multiple imagesrunning on the system, some on a single core, some using multiplecores. Intercommunicating with external Squeak images on VMs on Unixor other operating systems will appear from the programmers point ofview as transparent as among SiliconSqueak cores.

SiliconSqueak is a power efficient, parallel, reconfigurablearchitecture optimized for adaptive compilation. The system includes amix of basic and extended processors, where these extensions areconfigurable accelerators like the 64 ALU matrix. Other acceleratorslike a vector processor, a graphics processor, a MPEG encode/decoder,an encrypt/decrypt processor or an FPU can be implemented as hardware.These accelerators can also be implemented in microcode with the 64ALU matrix. As these accelerators communicate through the ringnetwork, they can be daisy chained to form larger matrixes.In FPGA implementations the ratio of SiliconSqueak cores andaccelerators can changes at runtime under control of the adaptivecompilation. So bytecodes will initially be interpreted, could then berecompiled into microcoded polymorphic inline caches. A secondrecompilation could implement the code on the 64 ALU matrix. A thirdlevel analysis can identify hotspots and reconfigure the FPGA hardwareaccordingly while the code is still running, replacing someSiliconSqueak cores with 64 ALU matrixes other configurableaccelerators and vice versa.

In the much faster ASIC implementations you are stuck with the choicesJecel and I will make based on results of running many Squeak imagesand having the adaptive compilation help find the optimal balancebetween the number of cores and configurable accelerators.

We ourselves will produce some low cost FPGA and ASIC systems thisquarter. First a scalable cluster of 8 core SiliconSqueak FPGA's thatyou can interconnect with backplanes into supercomputers. Next a verylow cost 16 core ASIC version with 10 Gbps Thunderbolt optical links.But you can just as easily use many other medium and high cost FPGAdevelopment boards out there, ranging from $80 (1 core) to $25,000(400 cores, 88 x 28 Gbps for a total of 2,7 Tbps per chip). Mostbrands of FPGA can accommodate SiliconSqueak easily. An FPGA core willsoon not fall below $10 per core, but ASICs can reach fractions of adollar cent. In future WSI (Wafer Scale Integration) the number ofcores and size of cache or memory will be fixed, although a changingpercentage of these cores will not function because of the unavoidabledamage to some parts on the wafer. Depending on the amount of memoryper core, on the size of wafer (8-30 inch) and the process used,thousands to millions of cores are possible. Eventually arriving at acomputer the size and price of an iPad with a million cores isentirely feasible.


Casey, why not just build it?

Our small, self funded research group designed these SiliconSqueakhardware systems for the next stages of the research in massivelyparallel, message passing, late bound dynamic software systems,growing more and more away from standard Squeak but retainingbackwards compatibility without restricting us.

We welcome anyone who wants to to implement their own bytecodelanguage systems like Frank, Lisp, etc and we invite people tocollaborate with us on our own Squeak based research.


Merik Voswinkel





_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] Alto-2?

Reply via email to