On 11 Apr 2011, at 10:38, Quentin Mathé wrote: > Le 4 avr. 2011 à 00:05, David Chisnall a écrit : > >> I've committed this code in Languages/ObjC2JS. It builds as a clang plugin, >> which uses clang to generates an abstract syntax tree and then walks it >> emitting JavaScript. > > I was wondering what's your take about the LLVM bitcode interpreter approach > taken by the Emscripten project. See > https://github.com/kripken/emscripten/wiki
It's an interesting project, but it's not really an interesting approach. I looked at compiling LLVM bitcode to JavaScript first, but it's just a horrible match. Neither the basic memory nor basic flow control primitives in LLVM have analogues in JavaScript, making it quite messy. They also have the disadvantage that they throw away the high-level semantics from the start. If we used this approach, we'd compile libobjc2 to bitcode, then compile Objective-C to bitcode, then interpret this in JavaScript. A method lookup would end up being a few dozen LLVM instructions, all of which would be interpreted. A good JS VM would JIT-compile the interpreter, but JIT compiling the interpreted code is an order of magnitude harder. They use the Closure Compiler to do this transform (which means that their compiler is two or three orders of magnitude more complex than mine), but it still wouldn't be able to map the Objective-C object model into the JavaScript one. The method lookup uses some shift and lookup operations that don't mesh well with the JavaScript model for integers, not with the JavaScript memory model, so even a good compiler going from this level will generate bad code. In contrast, a message send in my code is a lookup in a JavaScript dictionary and a call. Both of these are primitive JavaScript operations and, more importantly, they are ones whose performance affects every single JavaScript program. A half-decent JavaScript implementation will do things like inline method calls from my approach. >> From what I understand, there are more room for optimizations by emitting JS >> code from the Clang AST. For example, C or ObjC loops constructs can be >> remapped to JS loops, or ObjC message sends might be more easy to optimize >> (e.g. by inserting type feedback optimizations written in JS). > > However this means that another code generator will have to be written to > compile the LK languages to JS, when Emscripten would involve no extra code > generator. It requires about one line of code for each AST node - hardly any effort at all. LK's AST is much simpler than clang's, and clang's was pretty easy. > Given that they claim there are a lot of rooms for optimizations in their > approach (see > https://github.com/kripken/emscripten/raw/8a6e2d67c156d9eaedf88b752be4d1cf4242e088/docs/paper.pdf), > why did you choose to emit JS code from the Clang AST? In general, it's always better to give the optimiser as much information as possible, so generating JavaScript code that is semantically similar to the source is better. Oh, and if their code really does work as they described in the paper, then it's not just wrong, it's badly wrong. Things like overflow semantics in C are not correctly modelled (they are with mine). > Would message send optimizations be possible with Emscripten? Just harder > than with your approach? It's not really possible. I have a very lightweight Objective-C runtime implemented in JavaScript, which makes Objective-C objects into JavaScript objects and just adds a lightweight class model on top. All instance variable and method lookups are done as pure JavaScript slot lookups. This makes it trivial for the JavaScript VM to optimise the code - it just needs to use the same optimisations as it uses for normal JavaScript. In effect, it means that the compiler doesn't need to do any optimisations, it just needs to give the JS VM enough information to be able to do them. It also lets us do some nice high-level optimisations. For example, we can replace GSString and GSArray with JSString and JSArray, which work by just adding an isa field to Array.prototype and String.prototype, meaning that all of the effort that goes into optimising the array and string implementations in JavaScript will directly benefit us - any code that uses NSArray or NSString gets to be (almost) as fast as code using JavaScript arrays and strings directly. > Could your choice also be related that LLVM seems to lack some elements to > build a VM that would interpret the LLVM bitcode? Especially when you > consider the live optimizations (e.g. profiling and inlining method calls > while running) that a Java VM supports? Essentially, the problem is translating a high-level language (e.g. Objective-C) into a high-level language (JavaScript). Going via a low-level language (LLVM IR) means that you throw away a lot of useful information and then have to try to re-infer it later. It's a fundamentally flawed approach. It will work, if you put enough effort into it, but that doesn't make it sensible. David -- Sent from my Difference Engine _______________________________________________ Etoile-dev mailing list Etoile-dev@gna.org https://mail.gna.org/listinfo/etoile-dev