Le 24/02/12 20:23, stewart mackenzie a écrit :
Yves thanks a lot for that. I understand portions of it - to say the
least, I'll get there eventually.
I'm sure you will :-)
I fear I've been hounding Sébastien with my questions a bit too much!
(Sébastien thank you and much appreciated!)
Henceforth I shall make use of this platform for my questions.
He was faster than me but I'll try to shed another light on your questions.
This is about as much documentation I have on oz bytecode.
https://github.com/sjrd/mozart2-bootcompiler/blob/master/src/main/scala/org/mozartoz/bootcompiler/bytecode/OpCode.scala
Is there another more formal reference available?
For the new VM there is also the ultimate reference in:
https://github.com/yjaradin/mozart2/blob/master/vm/main/emulate.cc
For the old VM, the best documentation is in:
https://github.com/mozart/mozart/blob/master/share/lib/compiler/Doc/Instr
Next - I wish to create a test case for floats in
vm/test/factorial_tests.cpp but I have an issue with getting and
indeed _understanding_ oz bytecode.
It would seem the simplest way to obtain a correct representation of
the bytecode is by making use of the previous mozart-oz compiler and
clicking "oz->emulator code->buffer" in the emacs editor after I have
typed in correct oz code.
Now this is a simple Fibonacci sequence in oz:
declare N Res
proc {Fibonacci N Res}
if N == 0 then
Res = 0
elseif N == 1 then
Res = 1
else
Left Right
in
thread
{Fibonacci N-1 Left}
end
{Fibonacci N-2 Right}
Res = Left + Right
end
end
after clicking "oz->emulator code->buffer" I get the bytecode output here:
(Bytecode 1)
[...Removed...]
That bytecode is over-optimized by the compiler. A bytecode closer to
the one in main.cc is output for the following code:
declare
fun{F}
proc {Fibonacci N Res}
if N == 0 then
Res = 0
elseif N == 1 then
Res = 1
else
Left Right
in
thread
{Fibonacci N-1 Left}
end
{Fibonacci N-2 Right}
Res = Left + Right
end
end
in
Fibonacci
end
which gives:
%% Assignment of Global Registers:
%% g(0) = F
%% Code Size:
121 % words
skip
lbl(1) definition(x(0) 118 pid('Toplevel abstraction' 0 pos('' 1 0)
[sited] 1) unit [g(0)])
lbl(7) definition(x(0) 112 pid('F' 1 pos('Oz' 2 0) nil 3) <P: 1> nil)
createVariable(x(1))
lbl(15) definition(x(2) 103 pid('Fibonacci' 2 pos('Oz' 3 3) nil 4) unit
[x(1)])
allocateL3
testNumber(x(0) 0 31)
getNumber(0 x(1))
deAllocateL3
return
lbl(31) testNumber(x(0) 1 40)
getNumber(1 x(1))
deAllocateL3
return
lbl(40) createVariable(y(0))
lbl(42) definition(x(2) 65 pid('' 0 pos('Oz' 11 2) nil 3) unit [x(0)
g(0) y(0)])
move(g(0) x(0))
inlineMinus1(x(0) x(2))
move(x(2) x(0))
move(g(2) x(1))
tailCall(g(1) 2)
endDefinition(42)
lbl(65) callBI('Thread.create' [x(2)]#nil)
putConstant(2 x(2))
inlineMinus(x(0) x(2) x(3))
move(x(1) y(1))
move(x(3) x(0))
createVariableMove(y(2) x(1))
call(g(0) 2)
moveMove(y(0) x(0) y(2) x(1))
inlinePlus(x(0) x(1) x(2))
unify(x(2) y(1))
deAllocateL3
return
endDefinition(15)
lbl(103) unify(x(2) x(1))
unify(x(1) x(0))
return
endDefinition(7)
lbl(112) unify(x(0) g(0))
return
endDefinition(1)
lbl(118) tailCall(x(0) 0)
The part that corresponds to the new-style code that you gives is from
the line "allocateL3" (just after the line starting with "lbl(15)") to
the line "return" (two lines above the one starting with "lbl(103)") and
without the lines from "move(g(0) x(0))" (line after "lbl(42)") to the
line "endDefinition(42)" (line before "lbl(65)").
The reason for these cuts is that the old VM uses a single codearea for
many procedures, differing by more than the contextual environment. This
is bad for garbage collection as an abstraction will keep alive the code
and constants of all the things that were compiled with it.
Therefore, the new VM uses separate codearea for each procedure in the
Oz source (of course different instances with different contextual
environment will share the codearea).
So, removing unneeded code and documenting a bit more what the part
means, we get:
allocateL3
testNumber(x(0) const(0) lbl(31))
getNumber(const(0) x(1))
deAllocateL3
return
lbl(31) testNumber(x(0) const(1) lbl(40))
getNumber(const(1) x(1))
deAllocateL3
return
lbl(40) createVariable(y(0))
definition(x(2) lbl(65) pid('' arity(0) pos('Oz' 11 2) nil
countUsedX(3)) unit [x(0) g(0) y(0)])
lbl(65) callBI('Thread.create' [x(2)]#nil)
putConstant(const(2) x(2))
inlineMinus(x(0) x(2) x(3))
move(x(1) y(1))
move(x(3) x(0))
createVariableMove(y(2) x(1))
call(g(0) arity(2))
moveMove(y(0) x(0) y(2) x(1))
inlinePlus(x(0) x(1) x(2))
unify(x(2) y(1))
deAllocateL3
return
As mentioned in main.cc, the code there is slightly smarter than the
generated code because it avoids allocating Y registers (stack space)
for the two first cases (N==0 and N==1). If the current compiler did the
same, the code would be:
testNumber(x(0) const(0) lbl(31))
getNumber(const(0) x(1))
return
lbl(31) testNumber(x(0) const(1) lbl(40))
getNumber(const(1) x(1))
return
lbl(40) allocateL3
createVariable(y(0))
definition(x(2) lbl(65) pid('' arity(0) pos('Oz' 11 2) nil
countUsedX(3)) unit [x(0) g(0) y(0)])
lbl(65) callBI('Thread.create' [x(2)]#nil)
putConstant(const(2) x(2))
inlineMinus(x(0) x(2) x(3))
move(x(1) y(1))
move(x(3) x(0))
createVariableMove(y(2) x(1))
call(g(0) arity(2))
moveMove(y(0) x(0) y(2) x(1))
inlinePlus(x(0) x(1) x(2))
unify(x(2) y(1))
deAllocateL3
return
Now we can see the way it maps to the new VM code:
Although the above includes the main code block and also the factorial
code block, the below is only the factorial code block obtained by
referencing the vm/main/main.cc source file in the fibcodeblock
The below does not look at all like the above - except in a couple of places.
(Bytecode 2)
// if N == 0
OpInlineEqualsInteger, 0, 0, 4,
This tests whether x(0) is equals to 0 like testNumber did. If it isn't
we skip four elements in the bytecode (we don't have labels at this
level of abstraction)
// then
// Res = 0
OpUnifyXK, 1, 0,
this one is equivalent to the getNumber line. We don't have (yet) an
optimized opcode for unifying with an integer so we do something more
general, unifying x(1) with k(0) (see below for K registers)
OpReturn,
evident
// elseif N == 1
this is where we arrive if x(0)\=0
OpInlineEqualsInteger, 0, 1, 4,
same as above
// then
// Res = 1
OpUnifyXK, 1, 1,
same as above
OpReturn,
same as above
// else
OpAllocateY, 3,
equivalent to "allocateL 3" We chose to use Y everywhere for these
registers, the old VM was quite inconsistent.
OpCreateVarY, 0,
not too difficult
OpCreateAbstractionK, 0, 3, 3, 2,
we create the abstraction in x(2) with arity 0, using the codearea in
k(3) (the first 3 in the line), and having 3 G registers (the second 3)
the old VM code did essentialy the same except that it had the label for
where to continue (since the actual instructions for the procedure where
just after the "definition" opcode), some debugging info (code position,
identifier) and some extra we don't have/need yet (flags and refs). It
also had what to put in the new G registers which is what we do in the
three next instructions.
OpArrayInitElementX, 2, 0, 0,
g(0) of the procedure in x(2) is initialized with x(0). This opcode will
also be used to initialize other constructs such as tuples, etc.
OpArrayInitElementG, 2, 1, 0,
g(1) of x(2) is g(0)
OpArrayInitElementY, 2, 2, 0,
g(2) of x(2) is y(0)
//OpCallBuiltin, 4, 1, 2,
actually this version doesn't create a thread (the "new thread" builtin
is in k(4) has arity 1 and takes x(2) as parameter)
OpMoveMoveXYXY, 0, 1, 1, 2,
we save x(0) and x(1) in y(1) and y(2)
OpCallX, 2, 0,
and call x(2) with an arity of 0 (arguments to procedure calls are
placed in the first x registers by calling convention and all x are
caller-saved)
OpMoveMoveYXYX, 1, 0, 2, 1,
we restore x(0) and x(1) from y(1) and y(2)
OpMoveKX, 2, 2,
we place k(2) (== -2) in x(2) as equivalent to the putConstant (again we
could have an optimized opcode here too)
OpInlineAdd, 0, 2, 3,
we don't have substraction yet so we do an addition but used a negative
constant. Apart from that it's a literal translation from the old code
OpMoveXY, 1, 1,
trivial
OpMoveXX, 3, 0,
trivial
OpCreateVarMoveY, 2, 1,
trivial
OpCallG, 0, 2,
trivial
OpMoveMoveYXYX, 0, 0, 2, 1,
trivial
OpInlineAdd, 0, 1, 2,
trivial
OpUnifyXY, 2, 1,
trivial
OpDeallocateY,
we don't need to know the size of the stackframe in the new VM
OpReturn,
trivial
Here are a few questions:
- I presumed we are staying true to the previous mozart bytecode
generation. Why are there great differences between the structure -
obviously i'm not referring to the understandable differences in
representation ie oz vs c?
I hope most of them have been explained above. But ask for more details
if you want.
- Bytecode 1 doesn't make use of K registers whereas Bytecode 2 does -
why doesn't Bytecode 1 not make use of K registers?
K registers are the way the new VM deals with compile-time constants.
The old VM just wrote them in the codearea amongst the opcodes and
integer arguments. This forced many algorithms in the old VM to do
code-walking to find all the nodes in a codearea. It was also
problematic with the bigger nodes that we have to guarantee alignment.
So we decided to move the constants to proper registers. As they are
constants they were given the K letter.
- In Bytecode 2; is OpInlineEqualsInteger a hack? If not what is wrong
with a simple OpInlineEquals? (which automatically handles transient
variables, unbound to unbound, strict equality and binding via
polymophism for each datatype ie floats automatically)
As Sébastien explained, nothing wrong but the old VM uses testNumber for
a good reason, it's very common and much faster.
- I recall (correctly?) that all builtins are inline. If we are
changing the naming conventions why have "Inline" as part of the name?
For example OpInlineAdd - OpAdd should be just dandy.
Inlining is a bit overloaded with meanings in this discussion. By
realizing that some call is toa builtin, the Oz compiler can emit a
callBuiltin rather than a form of call suitable for abstractions. This
is a form of inlining as it allows the compiler to not spill X registers
(the calling convention for builtins is that they don't touch registers
that they don't receive). However, the emulator has to find the C++
procedure corresponding to the builtin from information in the bytecode,
so the actual C++ code of the builtin can't be inlined in the emulator
loop. By having specific opcodes for common builtins, we can have
inlining all the way to the c++ level but we can only do that for the
most common builtins.
- Are there other ways of obtaining oz bytecode for a given snippet of
oz code? I want to reverse engineer the bytecode with simpler examples
of oz code.
Emulator code is quite good for that, especially with the documents
above but you have to be careful to make your code difficult enough to
optimize as the Oz compiler is sometimes quite smart.
- are we using gmp (http://gmplib.org/) for floats and other
datatypes? If so what datatypes?
No, only for (big) integers. Floating point numbers are always
approximations, no matter how many decimals with work with, so we just
uses normal "double"s in the VMs (both old and new ones). If one wants,
defining "big" floats using gmp would certainly be possible but we
haven't had demand for this.
Now that I'm thinking of it, 'm not sure but I think bitarrays and
bitstrings might use gmp too.
- If there is no previous documentation, are we going to document the
bytecode? (which includes a map from the old version to the new
version - for sanity reasons)
There are no parts of the VM that we *want* to keep undocumented. But
time is a precious resource so any help...
-- Now your diagrams - phew (for those who are suffering as much as I
am: http://scifac.ru.ac.za/compilers/cha03g.htm)
good find! This style of diagram is old but I was thinking they were
more widespread than they apparently are.
- Why are we using scala to bootstrap when we want to keep dependencies low?
We need a bootstrap compiler written in anything but Oz (or any language
depending on Oz but that's not common). The obvious choice for
minimizing dependencies is C++ but writing a compiler (even a simple
one) in C++ is no piece of cake. It would have been my choice but I know
that by the time Sébastien will have a full compiler working, I would
have just had the framework to write the parsing and rewriting rules.
The decision process was quite easy. To have a compiler working soon, we
needed to write it in a language with strong symbolic support and
pattern-matching (Oz fits the bill nicely but ...) and the only such
language any of us had real expertise with was Scala, so Scala it was.
Once all is done and working, rewriting it in C++ would be a nice
exercise. In any case, it is a rather separate component without too
much interaction with the rest so Scla won't "contaminate" the whole
project.
- I confess I don't understand the bottom left diagram of the of the
detailed *.svg what is ozc and ozc1?
- What is the ----------------------- in light blue represent?
|oz ozc ozf|
------- --------
| oz |
--------
-In fact could you explain the whole bottom section - if you have the time :)
As Peter wrote, I think I should explain the whole diagram. I'll do that
in another message, this one is already starting to be long.
Thanks Yves and kudos on the design!
You are welcome, I did what I could...
Kind regards
Stewart
Yves
_________________________________________________________________________________
mozart-hackers mailing list
[email protected]
http://www.mozart-oz.org/mailman/listinfo/mozart-hackers
_________________________________________________________________________________
mozart-hackers mailing list [email protected]
http://www.mozart-oz.org/mailman/listinfo/mozart-hackers