On 11 Apr 2011, at 10:38, Quentin Mathé wrote:

> Le 4 avr. 2011 à 00:05, David Chisnall a écrit :
> 
>> I've committed this code in Languages/ObjC2JS.  It builds as a clang plugin, 
>> which uses clang to generates an abstract syntax tree and then walks it 
>> emitting JavaScript.
> 
> I was wondering what's your take about the LLVM bitcode interpreter approach 
> taken by the Emscripten project. See 
> https://github.com/kripken/emscripten/wiki

It's an interesting project, but it's not really an interesting approach.  I 
looked at compiling LLVM bitcode to JavaScript first, but it's just a horrible 
match.  Neither the basic memory nor basic flow control primitives in LLVM have 
analogues in JavaScript, making it quite messy.

They also have the disadvantage that they throw away the high-level semantics 
from the start.  If we used this approach, we'd compile libobjc2 to bitcode, 
then compile Objective-C to bitcode, then interpret this in JavaScript.  A 
method lookup would end up being a few dozen LLVM instructions, all of which 
would be interpreted.  A good JS VM would JIT-compile the interpreter, but JIT 
compiling the interpreted code is an order of magnitude harder.  They use the 
Closure Compiler to do this transform (which means that their compiler is two 
or three orders of magnitude more complex than mine), but it still wouldn't be 
able to map the Objective-C object model into the JavaScript one.  The method 
lookup uses some shift and lookup operations that don't mesh well with the 
JavaScript model for integers, not with the JavaScript memory model, so even a 
good compiler going from this level will generate bad code.

In contrast, a message send in my code is a lookup in a JavaScript dictionary 
and a call.  Both of these are primitive JavaScript operations and, more 
importantly, they are ones whose performance affects every single JavaScript 
program.  A half-decent JavaScript implementation will do things like inline 
method calls from my approach.

>> From what I understand, there are more room for optimizations by emitting JS 
>> code from the Clang AST. For example, C or ObjC loops constructs can be 
>> remapped to JS loops, or ObjC message sends might be more easy to optimize 
>> (e.g. by inserting type feedback optimizations written in JS).
> 
> However this means that another code generator will have to be written to 
> compile the LK languages to JS, when Emscripten would involve no extra code 
> generator.

It requires about one line of code for each AST node - hardly any effort at 
all.  LK's AST is much simpler than clang's, and clang's was pretty easy.

> Given that they claim there are a lot of rooms for optimizations in their 
> approach (see 
> https://github.com/kripken/emscripten/raw/8a6e2d67c156d9eaedf88b752be4d1cf4242e088/docs/paper.pdf),
>  why did you choose to emit JS code from the Clang AST?

In general, it's always better to give the optimiser as much information as 
possible, so generating JavaScript code that is semantically similar to the 
source is better.

Oh, and if their code really does work as they described in the paper, then 
it's not just wrong, it's badly wrong.  Things like overflow semantics in C are 
not correctly modelled (they are with mine).

> Would message send optimizations be possible with Emscripten? Just harder 
> than with your approach?

It's not really possible.  I have a very lightweight Objective-C runtime 
implemented in JavaScript, which makes Objective-C objects into JavaScript 
objects and just adds a lightweight class model on top.  All instance variable 
and method lookups are done as pure JavaScript slot lookups.  This makes it 
trivial for the JavaScript VM to optimise the code - it just needs to use the 
same optimisations as it uses for normal JavaScript.  In effect, it means that 
the compiler doesn't need to do any optimisations, it just needs to give the JS 
VM enough information to be able to do them.

It also lets us do some nice high-level optimisations.  For example, we can 
replace GSString and GSArray with JSString and JSArray, which work by just 
adding an isa field to Array.prototype and String.prototype, meaning that all 
of the effort that goes into optimising the array and string implementations in 
JavaScript will directly benefit us - any code that uses NSArray or NSString 
gets to be (almost) as fast as code using JavaScript arrays and strings 
directly.

> Could your choice also be related that LLVM seems to lack some elements to 
> build a VM that would interpret the LLVM bitcode? Especially when you 
> consider the live optimizations (e.g. profiling and inlining method calls 
> while running) that a Java VM supports?

Essentially, the problem is translating a high-level language (e.g. 
Objective-C) into a high-level language (JavaScript).  Going via a low-level 
language (LLVM IR) means that you throw away a lot of useful information and 
then have to try to re-infer it later.  It's a fundamentally flawed approach.  
It will work, if you put enough effort into it, but that doesn't make it 
sensible.

David

-- Sent from my Difference Engine

_______________________________________________
Etoile-dev mailing list
Etoile-dev@gna.org
https://mail.gna.org/listinfo/etoile-dev

Reply via email to