Sorry for the slow reply, I was busy.

Walter:

> Using the JVM forces your program into Java semantics.

Right, a JVM doesn't give all the freedom needed by a system language.


> For example, there are no
> structs in the JVM bytecode. No pointers, either. Nor unsigned types.

The little emscripten compiler compiles LLVM "bytecode" (IR code) to 
JavaScript, and despite JavaScript has no structs, it's able to compile them 
efficiently, lowering each one of them to a bunch of single variables. The 
result seems efficient enough (taking into account the target language is JS):
https://github.com/kripken/emscripten
This can't be used to return structs from a function, but I think allows to 
avoid few heap allocations (when the struct is a function argument, or it's 
used locally. The Oracle JavaVM nowadays performs escape analysis, I presume 
with similar effects).

Despite there are no unsigned integers in the JVM, something is moving:
https://blogs.oracle.com/darcy/entry/unsigned_api


> Your new language is fairly boxed in to being a rehash of Java semantics.

The semantics of languages like Jruby and Scala are different from Java 
semantics. Scala type system is not comparable to Java one. But I understand 
what you are saying. I think there is more than one meaning for "language 
semantics".


> That works if your language is expressible as C, because LLVM is a C/C++ back
> end. If your language has different semantics (like how Go does stacks), using
> LLVM can be a large problem.

LLVM is not infinitely flexible, I agree. But LLVM is already quite wide and 
growing. So even for strange languages LLVM seems able to implement a large 
percentage of the semantics out of the box. Recently they have implemented a 
Haskell back-end using LLVM and it was not able to implement everything needed. 
So the Haskell devs have written a small patch (2000 lines or so), to implement 
the missing semantics and the LLVM have added it to the main LLVM code.

In the discussion Andrei was talking about the amount of code needed to 
implement the efficient base of a new language. Even if LLVM is sometimes not 
able to implement 100% of the semantics of your code, I think writing just the 
few missing parts reduces the work needed a lot.


> I looked into this years ago. Very little of array bounds checking can be
> optimized away.

This paper (that's about Java) shows nice results in terms of percentage of 
array bound checks eliminated (but as expected the speedup is visible only on 
code that uses array heavily, like scientific-style code):
http://ssw.jku.at/Research/Papers/Wuerthinger07/

So there are any low-hanging fruits here?


> I've been working on optimizers for 25 years now, including a
> native code generating Java compiler, and I do know a few things about how to 
> do
> arrays.

You are one of my few programming heroes :-) But there's always some more to 
invent and learn.


> Clang has some pretty good ideas, like the spell checker on undefined 
> identifiers.

This is another idea I'd like in D:
http://d.puremagic.com/issues/show_bug.cgi?id=5004

Bye,
bearophile

Reply via email to