Re: Register VM WIP

Mark H Weaver Wed, 16 May 2012 06:46:17 -0700

Hi Andy!

Andy Wingo <wi...@pobox.com> writes:
> On Wed 16 May 2012 06:23, Mark H Weaver <m...@netris.org> writes:
>
>> It's surprising to me for another reason: in order to make the
>> instructions reasonably compact, only a limited number of bits are
>> available in each instruction to specify which registers to use.
>
> It turns out that being reasonably compact isn't terribly important --
> more important is the number of opcodes it takes to get something done,
> which translates to the number of dispatches.  Have you seen the "direct
> threading" VM implementation strategy?  In that case the opcode is not
> an index into a jump table, it's a word that encodes the pointer
> directly.  So it's a word wide, just for the opcode.  That's what
> JavaScriptCore does, for example.  The opcode is a word wide, and each
> operand is a word as well.
>
> The design of the wip-rtl VM is to allow 16M registers (24-bit
> addressing).  However many instructions can just address 2**8 registers
> (8-bit addressing) or 2**12 registers (12-bit addressing).  We will
> reserve registers 253 to 255 as temporaries.  If you have so many
> registers as to need more than that, then you have to shuffle operands
> down into the temporaries.  That's the plan, anyway.


I'm very concerned about this design, for the same reason that I was
concerned about NaN-boxing on 32-bit platforms.  Efficient use of memory
is extremely important on modern architectures, because of the vast (and
increasing) disparity between cache speed and RAM speed.  If you can fit
the active set into the cache, that often makes a profound difference in
the speed of a program.

I agree that with VMs, minimizing the number of dispatches is crucial,
but beyond a certain point, having more registers is not going to save
you any dispatches, because they will almost never be used anyway.
2^12 registers is _far_ beyond that point.

As I wrote before concerning NaN-boxing, I suspect that the reason these
memory-bloated designs are so successful in the JavaScript world is that
they are specifically optimized for use within a modern web browser,
which is already a memory hog anyway.  Therefore, if the language
implementation wastes yet more memory it will hardly be noticed.        

If I were designing this VM, I'd work hard to allow as many loops as
possible to run completely in the cache.  That means that three things
have to fit into the cache together: the VM itself, the user loop code,
and the user data.  IMO, the sum of these three things should be made as
small as possible.

I certainly agree that we should have a generous number of registers,
but I suspect that the sweet spot for a VM is 256, because it enables
more compact dispatching code in the VM, and yet is more than enough to
allow a decent register allocator to generate good code.

That's my educated guess anyway.  Feel free to prove me wrong :)

    Regards,
      Mark

Re: Register VM WIP

Reply via email to