Hi Krys,
On August 3, 2016 3:41:46 PM PDT, Krystal Mok <rednaxel...@gmail.com> wrote: >(Having missed last night's dinner and all the fun there...) > >When Rémi started talking about this interpreter today, I thought it'd >be >tailcall related. I would love that it was tailcall related too. people seams to reduce tailcall to a feature of some obscure languages but in reality each time you have to do some argument transformations/injections/checking in a generic way, it screams for tailcall. > >In college, my introductory course to computer systems was based on the >book "Introduction to Computing Systems: From Bits & Gates to C & >Beyond", >where it introduces a small educational 16-bit architecture called >LC-2. We >were given a LC-2 simulator to run our "assembly" programs. >Later on, when I've learned enough Java to be dangerous, I wanted to >write >a more powerful simulator (interpreter) of the LC-2 ISA in Java. > >My first version was a most straightforward switch-threaded one. >Nothing >much to be said about that. > >The next version, wanting to mimic a indirect-threaded style, was >structured in very much the same way Rémi presented in the link, but >MethodHandle was still years from available so I didn't have the luxury >of >using it. Instead, I modeled the opcode handlers as "single-method >objects", and do a virtual call at the end of each handler. >(So, instead of Rémi's MH calls, I'm just using a table lookup + >virtual >call) > >public class NopHandler extends AbstractHandler // or implements >Handler >{ > public void execute(State s) { > HANDLERS[s.code[s.pc++]].execute(s); // dispatch > } >} > >That's didn't work out quite well. The performance was okay-ish with >small >program inputs. But HotSpot's JIT compilers doesn't implement tail call >optimizations (well, recursive inlining sort of counts...), so with a >long-enough program the stack blows up. And it didn't take that long of >a >LC-2 assembler program to make it blow up. > To avoid of blowing the stack, the trick is to stop the "trace" at the end of a basic block, i.e. when you have a jump or a return. It also means that things like null check, things that should not jump, has to be encoded with a different bytecode that conditional jump because you don't want to stop the trace for them. >I realized that I had to have proper tail calls to make it work. So I >added >a central dispatch loop to serve as a trampoline, e.g. > >Handler next = getNextHandler(); >while (next != null) { > next = next.execute(s); >} > >where my Handler.execute() would still perform the lookup, but not the >actual invocation. As expected, the performance was no where as good as >directly making the virtual call at the tail position in each handler. >This >central dispatch loop makes the unpredictable-ness of the >Handler.execute() >significant. > >Need a good trampoline. Need proper tail calls. YES ! > >- Kris Rémi > >On Wed, Aug 3, 2016 at 3:14 PM, John Rose <john.r.r...@oracle.com> >wrote: > >> On Aug 3, 2016, at 1:22 PM, Remi Forax <fo...@univ-mlv.fr> wrote: >> > >> > Charles ask if we can make a fast register interpreter in Java, >> > here is my take on a threaded-like interpreter >> > >> > https://gist.github.com/forax/f38b533e089217cfc4d0ae3c6e2de9c9 >> >> Nicely done. We were talking last night at dinner about making >> bytecode interpreters go fast, and this is helpful. >> >> I think there may be a combinator here, for bytecode dispatch loops. >> (This feels like the PIC combinator, both fundamental and tricky to >get >> right.) >> A "bytecode" dispatch combinator might provide a pattern like the >> following: >> >> MethodHandle init, pred, step, fini, index; >> @NonNull MethodHandle[] tab; >> R dispatch(A… a) { >> V v = init(a…); // one here; there might be many v… >> while (pred(a…)) { >> v = step(v, a…); >> // execute one "instruction": >> int i = index(v, a…); >> tab[i].invokeExact(v, a…); >> } >> return fini(V, a…); >> } >> >> The explicit table would be recopied internally to an @Stable array >for >> constant folding. >> The various index values could be tracked and turned into traces (or >some >> other profile >> driven structure) which would be compiled together, in specialized >> segments of the unrolled >> interpreter loop. >> >> Or maybe just the table lookup part alone makes a good combinator, to >be >> paired >> with the loop combinators? >> >> Comments welcome… >> >> — John >> >> P.S. Another new RFE: >> experimental hook for creating heisenboxes >> https://bugs.openjdk.java.net/browse/JDK-8163133 >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev@openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev >> > > >------------------------------------------------------------------------ > >_______________________________________________ >mlvm-dev mailing list >mlvm-dev@openjdk.java.net >http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev