On 05/13/2016 06:32 PM, Stefan Koch wrote: > I would like to work on a solution that does scale. The Problem is > not making a byteCode-interpreter. That part is relatively easy. > Currently I am trying to get a detailed understanding of dmd and > it's data-structures. (mainly it's AST.) > > Generating the byte-code seems to be non-trivial. > > I wonder in how far the glue layer can be of help...
Seems like I've to repeat this once more, b/c everyone including me didn't got it in the first place. We don't need a bytecode interpreter, it mostly adds overhead and a complicated second layer between walking the AST and interpreting it (think of transforming a for loop with goto into linear bytecode, almost as complicated as in the glue layer). What we basically need is a stack of values, a stack of frames (for function calls and variables in scopes), and an AST visitor that does the interpretation. It would be most helpful for the success of this to follow common CS examples like [¹], [²], or [³]. With realistic expectations we might have a working implementation in a month or so. With more requirements like bytecode, using dmd's backend, or JIT we end up with a long newsgroup discussion ;). Tricky things for a CTFE interpreter include: - enumerating VarDeclarations (they already have a ctfeAdrOnStack field) in each scope, and referring to outer variables from nested scopes At best just use a continuous stack and just set the stack pointer to the last frame pointer when leaving a scope. - getting function calls right Push arguments, on return shift top of stack under arguments and pop them (caller cleanup). If possible detect and support tail recursion. - converting AST values to and from Interpreter Values. Literals and constant VarExp from the AST need to be converted to an interpreter Value before being pushed on the stack. The result of interpretation (top of stack) needs to be converted back to an AST literal. Using separate data types (instead of creating AST values in the interpreter) will be a major performance improvement over using AST values (e.g. 16 vs. ~100 bytes). It also creates a clear boundary between Interpreter and AST values. Currently quite some complexity is thrown at cleaning interpreter generated AST values, and distinguishing interpreter owned from normal AST values (which allows some optimizations) [⁴]. We don't need a tagged union b/c the AST already contains the type information, but a tag could be helpful for debugging [⁵]. Values can be deleted when popped from stack (no ref counting needed I think). - Implementing more complex data structures (arrays, strings, hash tables, aggregates) Use Value[], Value[Value], and a dedicated String (char[]/wchar[]/dchar[]). For structs/classes field indexes are known => use fix-sized Value[]. Value.class_ needs to hold a reference to the actual class instance for vtable calls. Last time I was working on this (also on a bytecode interpreter) the entry point was fairly clear [⁶] (thanks to Don). -Martin [¹]: [The Interpreter In An Undergraduate Compilers Course ](http://arxiv.org/pdf/1412.0426.pdf) [²]: [L8: Interpreters & Visitors](http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-005-elements-of-software-construction-fall-2011/lecture-notes/MIT6_005F11_lec08.pdf) [³]: [PA 2: Interpreter](https://sites.google.com/a/bodik.org/cs164/projects/pa2) [⁴]: https://github.com/dlang/dmd/search?q=ownedByCtfe [⁵]: https://github.com/MartinNowak/dmd/blob/28ffb0ab4fa6950f60c085f33f8a2ce23df7c0cd/src/interpret.c#L73 [⁶]: https://github.com/MartinNowak/dmd/blob/28ffb0ab4fa6950f60c085f33f8a2ce23df7c0cd/src/interpret.c#L693