Re: News from the new VM

Yves Jaradin Sat, 25 Feb 2012 10:02:30 -0800

Le 24/02/12 20:23, stewart mackenzie a écrit :

Yves thanks a lot for that. I understand portions of it - to say the
least, I'll get there eventually.


I'm sure you will :-)


I fear I've been hounding Sébastien with my questions a bit too much!
(Sébastien thank you and much appreciated!)
Henceforth I shall make use of this platform for my questions.


He was faster than me but I'll try to shed another light on your questions.


This is about as much documentation I have on oz bytecode.
https://github.com/sjrd/mozart2-bootcompiler/blob/master/src/main/scala/org/mozartoz/bootcompiler/bytecode/OpCode.scala
Is there another more formal reference available?


For the new VM there is also the ultimate reference in:
https://github.com/yjaradin/mozart2/blob/master/vm/main/emulate.cc

For the old VM, the best documentation is in:
https://github.com/mozart/mozart/blob/master/share/lib/compiler/Doc/Instr


Next - I wish to create a test case for floats in
vm/test/factorial_tests.cpp but I have an issue with getting and
indeed _understanding_ oz bytecode.
It would seem the simplest way to obtain a correct representation of
the bytecode is by making use of the previous mozart-oz compiler and
clicking "oz->emulator code->buffer" in the emacs editor after I have
typed in correct oz code.

Now this is a simple Fibonacci sequence in oz:

declare N Res
proc {Fibonacci N Res}
     if N == 0 then
        Res = 0
     elseif N == 1 then
        Res = 1
     else
        Left Right
     in
        thread
           {Fibonacci N-1 Left}
        end
        {Fibonacci N-2 Right}
        Res = Left + Right
    end
end

after clicking "oz->emulator code->buffer" I get the bytecode output here:

(Bytecode 1)
[...Removed...]

That bytecode is over-optimized by the compiler. A bytecode closer tothe one in main.cc is output for the following code:


declare
fun{F}
   proc {Fibonacci N Res}
      if N == 0 then
         Res = 0
      elseif N == 1 then
         Res = 1
      else
         Left Right
      in
         thread
            {Fibonacci N-1 Left}
         end
         {Fibonacci N-2 Right}
         Res = Left + Right
      end
   end
in
   Fibonacci
end

which gives:

%% Assignment of Global Registers:
%%    g(0) = F
%% Code Size:
121 % words
                skip

lbl(1) definition(x(0) 118 pid('Toplevel abstraction' 0 pos('' 1 0)[sited] 1) unit [g(0)])

lbl(7)          definition(x(0) 112 pid('F' 1 pos('Oz' 2 0) nil 3) <P: 1> nil)
                createVariable(x(1))

lbl(15) definition(x(2) 103 pid('Fibonacci' 2 pos('Oz' 3 3) nil 4) unit[x(1)])

                allocateL3
                testNumber(x(0) 0 31)
                getNumber(0 x(1))
                deAllocateL3
                return
lbl(31)         testNumber(x(0) 1 40)
                getNumber(1 x(1))
                deAllocateL3
                return
lbl(40)         createVariable(y(0))

lbl(42) definition(x(2) 65 pid('' 0 pos('Oz' 11 2) nil 3) unit [x(0)g(0) y(0)])

                move(g(0) x(0))
                inlineMinus1(x(0) x(2))
                move(x(2) x(0))
                move(g(2) x(1))
                tailCall(g(1) 2)
                endDefinition(42)
lbl(65)         callBI('Thread.create' [x(2)]#nil)
                putConstant(2 x(2))
                inlineMinus(x(0) x(2) x(3))
                move(x(1) y(1))
                move(x(3) x(0))
                createVariableMove(y(2) x(1))
                call(g(0) 2)
                moveMove(y(0) x(0) y(2) x(1))
                inlinePlus(x(0) x(1) x(2))
                unify(x(2) y(1))
                deAllocateL3
                return
                endDefinition(15)
lbl(103)        unify(x(2) x(1))
                unify(x(1) x(0))
                return
                endDefinition(7)
lbl(112)        unify(x(0) g(0))
                return
                endDefinition(1)
lbl(118)        tailCall(x(0) 0)

The part that corresponds to the new-style code that you gives is fromthe line "allocateL3" (just after the line starting with "lbl(15)") tothe line "return" (two lines above the one starting with "lbl(103)") andwithout the lines from "move(g(0) x(0))" (line after "lbl(42)") to theline "endDefinition(42)" (line before "lbl(65)").

The reason for these cuts is that the old VM uses a single codearea formany procedures, differing by more than the contextual environment. Thisis bad for garbage collection as an abstraction will keep alive the codeand constants of all the things that were compiled with it.Therefore, the new VM uses separate codearea for each procedure in theOz source (of course different instances with different contextualenvironment will share the codearea).So, removing unneeded code and documenting a bit more what the partmeans, we get:


                allocateL3
                testNumber(x(0) const(0) lbl(31))
                getNumber(const(0) x(1))
                deAllocateL3
                return
lbl(31)         testNumber(x(0) const(1) lbl(40))
                getNumber(const(1) x(1))
                deAllocateL3
                return
lbl(40)         createVariable(y(0))

definition(x(2) lbl(65) pid('' arity(0) pos('Oz' 11 2) nilcountUsedX(3)) unit [x(0) g(0) y(0)])

lbl(65)         callBI('Thread.create' [x(2)]#nil)
                putConstant(const(2) x(2))
                inlineMinus(x(0) x(2) x(3))
                move(x(1) y(1))
                move(x(3) x(0))
                createVariableMove(y(2) x(1))
                call(g(0) arity(2))
                moveMove(y(0) x(0) y(2) x(1))
                inlinePlus(x(0) x(1) x(2))
                unify(x(2) y(1))
                deAllocateL3
                return

As mentioned in main.cc, the code there is slightly smarter than thegenerated code because it avoids allocating Y registers (stack space)for the two first cases (N==0 and N==1). If the current compiler did thesame, the code would be:


                testNumber(x(0) const(0) lbl(31))
                getNumber(const(0) x(1))
                return
lbl(31)         testNumber(x(0) const(1) lbl(40))
                getNumber(const(1) x(1))
                return
lbl(40)         allocateL3
                createVariable(y(0))

definition(x(2) lbl(65) pid('' arity(0) pos('Oz' 11 2) nilcountUsedX(3)) unit [x(0) g(0) y(0)])

lbl(65)         callBI('Thread.create' [x(2)]#nil)
                putConstant(const(2) x(2))
                inlineMinus(x(0) x(2) x(3))
                move(x(1) y(1))
                move(x(3) x(0))
                createVariableMove(y(2) x(1))
                call(g(0) arity(2))
                moveMove(y(0) x(0) y(2) x(1))
                inlinePlus(x(0) x(1) x(2))
                unify(x(2) y(1))
                deAllocateL3
                return

Now we can see the way it maps to the new VM code:


Although the above includes the main code block and also the factorial
code block, the below is only the factorial code block obtained by
referencing the vm/main/main.cc source file in the fibcodeblock
The below does not look at all like the above - except in a couple of places.
(Bytecode 2)

     // if N == 0
     OpInlineEqualsInteger, 0, 0, 4,

This tests whether x(0) is equals to 0 like testNumber did. If it isn'twe skip four elements in the bytecode (we don't have labels at thislevel of abstraction)


     // then
     //    Res = 0
     OpUnifyXK, 1, 0,

this one is equivalent to the getNumber line. We don't have (yet) anoptimized opcode for unifying with an integer so we do something moregeneral, unifying x(1) with k(0) (see below for K registers)

     OpReturn,

evident


     // elseif N == 1

this is where we arrive if x(0)\=0

     OpInlineEqualsInteger, 0, 1, 4,

same as above


     // then
     //    Res = 1
     OpUnifyXK, 1, 1,

same as above

     OpReturn,

same as above


     // else

     OpAllocateY, 3,

equivalent to "allocateL 3" We chose to use Y everywhere for theseregisters, the old VM was quite inconsistent.


     OpCreateVarY, 0,

not too difficult


     OpCreateAbstractionK, 0, 3, 3, 2,

we create the abstraction in x(2) with arity 0, using the codearea ink(3) (the first 3 in the line), and having 3 G registers (the second 3)the old VM code did essentialy the same except that it had the label forwhere to continue (since the actual instructions for the procedure wherejust after the "definition" opcode), some debugging info (code position,identifier) and some extra we don't have/need yet (flags and refs). Italso had what to put in the new G registers which is what we do in thethree next instructions.

     OpArrayInitElementX, 2, 0, 0,

g(0) of the procedure in x(2) is initialized with x(0). This opcode willalso be used to initialize other constructs such as tuples, etc.

     OpArrayInitElementG, 2, 1, 0,

g(1) of x(2) is g(0)

     OpArrayInitElementY, 2, 2, 0,

g(2) of x(2) is y(0)


     //OpCallBuiltin, 4, 1, 2,

actually this version doesn't create a thread (the "new thread" builtinis in k(4) has arity 1 and takes x(2) as parameter)

     OpMoveMoveXYXY, 0, 1, 1, 2,

we save x(0) and x(1) in y(1) and y(2)

     OpCallX, 2, 0,

and call x(2) with an arity of 0 (arguments to procedure calls areplaced in the first x registers by calling convention and all x arecaller-saved)

     OpMoveMoveYXYX, 1, 0, 2, 1,

we restore x(0) and x(1) from  y(1) and y(2)


     OpMoveKX, 2, 2,

we place k(2) (== -2) in x(2) as equivalent to the putConstant (again wecould have an optimized opcode here too)

     OpInlineAdd, 0, 2, 3,

we don't have substraction yet so we do an addition but used a negativeconstant. Apart from that it's a literal translation from the old code

     OpMoveXY, 1, 1,

trivial

     OpMoveXX, 3, 0,

trivial

     OpCreateVarMoveY, 2, 1,

trivial

     OpCallG, 0, 2,

trivial


     OpMoveMoveYXYX, 0, 0, 2, 1,

trivial

     OpInlineAdd, 0, 1, 2,

trivial

     OpUnifyXY, 2, 1,

trivial


     OpDeallocateY,

we don't need to know the size of the stackframe in the new VM

     OpReturn,

trivial


Here are a few questions:
- I presumed we are staying true to the previous mozart bytecode
generation. Why are there great differences between the structure -
obviously i'm not referring to the understandable differences in
representation ie oz vs c?

I hope most of them have been explained above. But ask for more detailsif you want.

- Bytecode 1 doesn't make use of K registers whereas Bytecode 2 does -
why doesn't Bytecode 1 not make use of K registers?

K registers are the way the new VM deals with compile-time constants.The old VM just wrote them in the codearea amongst the opcodes andinteger arguments. This forced many algorithms in the old VM to docode-walking to find all the nodes in a codearea. It was alsoproblematic with the bigger nodes that we have to guarantee alignment.So we decided to move the constants to proper registers. As they areconstants they were given the K letter.

- In Bytecode 2; is OpInlineEqualsInteger a hack? If not what is wrong
with a simple OpInlineEquals? (which automatically handles transient
variables, unbound to unbound, strict equality and binding via
polymophism for each datatype ie floats automatically)

As Sébastien explained, nothing wrong but the old VM uses testNumber fora good reason, it's very common and much faster.

- I recall (correctly?) that all builtins are inline. If we are
changing the naming conventions why have "Inline" as part of the name?
For example OpInlineAdd - OpAdd should be just dandy.

Inlining is a bit overloaded with meanings in this discussion. Byrealizing that some call is toa builtin, the Oz compiler can emit acallBuiltin rather than a form of call suitable for abstractions. Thisis a form of inlining as it allows the compiler to not spill X registers(the calling convention for builtins is that they don't touch registersthat they don't receive). However, the emulator has to find the C++procedure corresponding to the builtin from information in the bytecode,so the actual C++ code of the builtin can't be inlined in the emulatorloop. By having specific opcodes for common builtins, we can haveinlining all the way to the c++ level but we can only do that for themost common builtins.

- Are there other ways of obtaining oz bytecode for a given snippet of
oz code? I want to reverse engineer the bytecode with simpler examples
of oz code.

Emulator code is quite good for that, especially with the documentsabove but you have to be careful to make your code difficult enough tooptimize as the Oz compiler is sometimes quite smart.

- are we using gmp (http://gmplib.org/) for floats and other
datatypes? If so what datatypes?

No, only for (big) integers. Floating point numbers are alwaysapproximations, no matter how many decimals with work with, so we justuses normal "double"s in the VMs (both old and new ones). If one wants,defining "big" floats using gmp would certainly be possible but wehaven't had demand for this.Now that I'm thinking of it, 'm not sure but I think bitarrays andbitstrings might use gmp too.

- If there is no previous documentation, are we going to document the
bytecode? (which includes a map from the old version to the new
version - for sanity reasons)

There are no parts of the VM that we *want* to keep undocumented. Buttime is a precious resource so any help...


-- Now your diagrams - phew (for those who are suffering as much as I
am: http://scifac.ru.ac.za/compilers/cha03g.htm)

good find! This style of diagram is old but I was thinking they weremore widespread than they apparently are.

- Why are we using scala to bootstrap when we want to keep dependencies low?

We need a bootstrap compiler written in anything but Oz (or any languagedepending on Oz but that's not common). The obvious choice forminimizing dependencies is C++ but writing a compiler (even a simpleone) in C++ is no piece of cake. It would have been my choice but I knowthat by the time Sébastien will have a full compiler working, I wouldhave just had the framework to write the parsing and rewriting rules.The decision process was quite easy. To have a compiler working soon, weneeded to write it in a language with strong symbolic support andpattern-matching (Oz fits the bill nicely but ...) and the only suchlanguage any of us had real expertise with was Scala, so Scala it was.Once all is done and working, rewriting it in C++ would be a niceexercise. In any case, it is a rather separate component without toomuch interaction with the rest so Scla won't "contaminate" the wholeproject.

- I confess I don't understand the bottom left diagram of the of the
detailed *.svg what is ozc and ozc1?
- What is the ----------------------- in light blue represent?
                      |oz     ozc      ozf|
                      -------         --------
                              |  oz  |
                              --------
-In fact could you explain the whole bottom section - if you have the time :)

As Peter wrote, I think I should explain the whole diagram. I'll do thatin another message, this one is already starting to be long.


Thanks Yves and kudos on the design!

You are welcome, I did what I could...


Kind regards
Stewart


Yves

_________________________________________________________________________________
mozart-hackers mailing list                           
[email protected]
http://www.mozart-oz.org/mailman/listinfo/mozart-hackers


_________________________________________________________________________________

mozart-hackers mailing list [email protected]http://www.mozart-oz.org/mailman/listinfo/mozart-hackers

Re: News from the new VM

Reply via email to