Re: Battle-plan for CTFE

Martin Nowak via Digitalmars-d-announce Mon, 16 May 2016 06:16:21 -0700

On 05/15/2016 03:55 PM, Stefan Koch wrote:
> 
> About the whole to BC or not to BC discussion.
> As Daniel already outlined, the main purpose it not speed, but having a
> simple  lowered representation to interpret.
> Many AST interpreters switched to a byte-code because it's much easier
> to maintain.
>


I just don't buy the argument for using BC.

In order to generate linearized bytecode you'll have to fully walk
(a.k.a. interpret) your AST tree, collect and fixup variable references
and jump addresses, precompute frame sizes, and still maintain a
dedicated stack and heap for variables (b/c you don't want to use D's GC
to maintain the lifetime, and using raw pointers will easily result in
time-consuming memory corruptions).

This effort might be worthwhile for hot loops if you wanted to generate
simple asm code so that the CPU can do the interpretation (2nd step
after AST interpreter in [¹]). But at this point this would be a
premature optimization.
Clearly a BC interpreter is both more complex than an AST interpreter
and less optimal than a simple JIT.

Sure supporting D's pointer arithmetic in an interpreter will be
challenging, but in fact you'll have to solve the same problem for a
bytecode interpreter (how to maintain ownership when pointing to a part
of a struct).
The simple but bad solution, using raw pointers and relying on D's GC,
would work for both interpreters.

Another simple but RC friendly solution is to compose everything from
reference Values, e.g. a struct or array being modeled as RC!Value[],
and a hash as RC!Value[RC!Value].

One could even do a hybrid between value and reference type by lazily
moving values onto the heap when needed.

struct Value
{
  static struct Impl
  {
    mixin(bitfields!(
      uint, "refCount", 31,
      bool, "onHeap", 1));

    union
    {
      dinteger_t int_;
      uinteger_t uint_;
      real_t real_;
      String str_;
      Value[] array;
      Impl* heap;
    }
  }
  Impl impl;

  ref Impl get() { return onHeap ? *impl.heap : impl; }
  alias get this;

  void moveToHeap()
  {
    if (impl.onHeap)
      return;
    auto p = heapAllocator.alloc!Impl;
    *p = impl;
    p.refCount = 1;
    impl.heap = p;
    impl.onHeap = true;
  }

  ~this()
  {
    if (impl.onHeap && --impl.heap.refCount == 0)
      heapAllocator.free(impl.heap);
  }
}

auto heapAllocator = FreeList!(AllocatorList!(
    (size_t n) => Region!Mallocator(max(n, 1024 * 1024))
))();

//...
class Interpreter
{
  //...
  void visit(PtrExp e)
  {
    stack.push(interpret(e.e1, istate, ctfeNeedLvalue));
  }

  void visit(IndexExp e)
  {
    accept(e1);
    auto v1 = stack.pop();
    accept(e2);
    auto v2 = stack.pop();

    if (ctfeGoal == ctfeNeedLvalue)
        v1.array[v2.uint_].moveToHeap();
    stack.push(v1.array[v2.uint_]);
  }
//...
}

That way you could benefit from memory efficient value types while still
being able to take references from any element.
Should be possible to do the same with references/pointers to stack
variables, but you want check that refCount == 1 when a stack frame gets
cleaned up.

[¹]: http://dconf.org/2013/talks/chevalier_boisvert.pdf p. 59 ff

Re: Battle-plan for CTFE

Reply via email to