Hi,
This looks very good. i like the hole approach and this approach has the
potential to address most of the issues I have seen when disassembling
guile-2.0 output. A few notes.
1. What about growing stacks any coments if they will be easier to manage
for this setup. Can one copy the C stack logic?
2. Is there an instruction that does what call does but can be used for
tail call's
when it needs it e.g. the code
for (n = 0; n < nargs; n++)
LOCAL_SET (n, old_fp[ip[4 + n]]);
that is missing for the tail code
3. I would appriciate if the frame is always below say 256 SCM:s of the fp
stack limit
that way when preparing tail calling one doesn't usally need to check if
the argument fit's
when issuing a tail call. If you compile a function that tail call more
then 254 (?) arguments
then you can as well check because then be free relative the argument
handling.
4. I think the logic code hook I recently investigated could easily fit
into this VM engine with
using similar techniques as I described in previous mails.
Thanks for your work on this
Stefan
On Fri, May 11, 2012 at 6:19 PM, Andy Wingo <[email protected]> wrote:
> Hi all,
>
> This mail announces some very early work on a register VM. The code is
> in wip-rtl ("work in progress, register transfer language". The latter
> bit is something of a misnomer.). There is not much there yet:
> basically just the VM, an assembler, and a disassembler. Still, it's
> interesting, and I thought people might want to hear more about it.
>
> So, the deal: why is it interesting to switch from a stack VM, which is
> what we have, to a register VM? There are three overriding
> disadvantages to the current VM.
>
> 1) With our stack VM, instructions are smaller. They do less, so you
> need more of them. This increases dispatch cost, which is the
> largest cost of a VM.
>
> 2) On a stack VM, there is a penalty to naming values. Since the only
> values that are accessible to an instruction are the ones on the
> top of the stack, whenever you want to use more names, you end up
> doing a lot of local-ref / local-set operations. In contrast an
> instruction for a register VM can address many more operands, so
> there is much less penalty to storing something on the stack. (The
> penalty is not so much in the storage, but in the pushing and
> popping needed to access it.)
>
> 3) Our stack VM has variable-sized stack frames, so we need to check
> for overflow every time we push a value on the stack. This is
> quite costly.
>
> The WIP register VM fixes all of these issues.
>
> The basic design of the VM is: 32-bit instruction words, 8-bit opcodes,
> variable-length instructions, maximum of 24-bit register addressing, and
> static, relocatable allocation of constants.
>
> Also, with the wip-rtl VM there is no stack pointer: locals are
> addressed directly via the frame pointer, and the call frame for a
> function is of a fixed size. Indeed the IP and FP are the only state
> variables of the VM, which makes it much easier to think about native
> compilation, given the scarcity of CPU registers on some architectures.
>
> See vm-engine.c from around line 1000 for a commented set of
> instructions. It's messy in many ways now, but hey.
>
> As far as performance goes, we won't know yet. But at least for a
> simple loop, counting down from a billion, the register VM is a few
> times faster than the stack VM. Still, I would be happy if the general
> speedup were on the order of 40%. We'll see.
>
> Here's that loop in rtl VM:
>
> (use-modules (system vm rtl))
>
> (assemble-rtl-program
> 0
> '((assert-nargs-ee/locals 1 2)
> (br fix-body)
> loop-head
> (make-short-immediate 2 0)
> (br-if-= 1 2 out)
> (sub1 1 1)
> (br loop-head)
> fix-body
> (mov 1 0)
> (br loop-head)
> out
> (make-short-immediate 0 #t)
> (return 0)))
>
> There are various ways to improve this, but its structure is like what
> the stack VM produces.
>
> Compare to the current opcode:
>
> scheme@(guile-user)> (define (countdown n) (let lp ((n n)) (or (zero?
> n) (lp (1- n)))))
> scheme@(guile-user)> ,x countdown
> Disassembly of #<procedure countdown (n)>:
>
> 0 (assert-nargs-ee/locals 17) ;; 1 arg, 2 locals
> 2 (br :L186) ;; -> 30
> 6 (local-ref 1) ;; `n'
> 8 (make-int8:0) ;; 0
> 9 (ee?)
> 10 (local-set 2) ;; `t'
> 12 (local-ref 2) ;; `t'
> 14 (br-if-not :L187) ;; -> 21
> 18 (local-ref 2) ;; `t'
> 20 (return)
> 21 (local-ref 1) ;; `n'
> 23 (sub1)
> 24 (local-set 1) ;; `n'
> 26 (br :L188) ;; -> 6
> 30 (local-ref 0) ;; `n'
> 32 (local-set 1)
> 34 (br :L188) ;; -> 6
>
> OK, time to set down the keyboard; been working far too much on this in
> recent days. I still need to adapt the compiler to produce RTL
> bytecode. I am going to let it sit for a week or two before touching it
> again. Comments welcome.
>
> Regards,
>
> Andy
> --
> http://wingolog.org/
>
>