Hi Andy! Andy Wingo <wi...@pobox.com> writes: > On Wed 16 May 2012 06:23, Mark H Weaver <m...@netris.org> writes: > >> It's surprising to me for another reason: in order to make the >> instructions reasonably compact, only a limited number of bits are >> available in each instruction to specify which registers to use. > > It turns out that being reasonably compact isn't terribly important -- > more important is the number of opcodes it takes to get something done, > which translates to the number of dispatches. Have you seen the "direct > threading" VM implementation strategy? In that case the opcode is not > an index into a jump table, it's a word that encodes the pointer > directly. So it's a word wide, just for the opcode. That's what > JavaScriptCore does, for example. The opcode is a word wide, and each > operand is a word as well. > > The design of the wip-rtl VM is to allow 16M registers (24-bit > addressing). However many instructions can just address 2**8 registers > (8-bit addressing) or 2**12 registers (12-bit addressing). We will > reserve registers 253 to 255 as temporaries. If you have so many > registers as to need more than that, then you have to shuffle operands > down into the temporaries. That's the plan, anyway.
I'm very concerned about this design, for the same reason that I was concerned about NaN-boxing on 32-bit platforms. Efficient use of memory is extremely important on modern architectures, because of the vast (and increasing) disparity between cache speed and RAM speed. If you can fit the active set into the cache, that often makes a profound difference in the speed of a program. I agree that with VMs, minimizing the number of dispatches is crucial, but beyond a certain point, having more registers is not going to save you any dispatches, because they will almost never be used anyway. 2^12 registers is _far_ beyond that point. As I wrote before concerning NaN-boxing, I suspect that the reason these memory-bloated designs are so successful in the JavaScript world is that they are specifically optimized for use within a modern web browser, which is already a memory hog anyway. Therefore, if the language implementation wastes yet more memory it will hardly be noticed. If I were designing this VM, I'd work hard to allow as many loops as possible to run completely in the cache. That means that three things have to fit into the cache together: the VM itself, the user loop code, and the user data. IMO, the sum of these three things should be made as small as possible. I certainly agree that we should have a generous number of registers, but I suspect that the sweet spot for a VM is 256, because it enables more compact dispatching code in the VM, and yet is more than enough to allow a decent register allocator to generate good code. That's my educated guess anyway. Feel free to prove me wrong :) Regards, Mark