Re: [jvm-l] Why not move thru .java before bytecode?

Jochen Theodorou Tue, 24 Nov 2009 00:16:07 -0800

[email protected] schrieb:
>  
> Some jvm languages have problems left to solve:
[...]


just to add some points... I think the main part here is if you decide 
to go with the Java model or not. For example if the bytecode you output 
is just there to control an interpreter, then breakpoints on the 
bytecode level might be useless. This language may have its own debugger 
and a "foreign" stack, if any. "foreign" in the sense that the java 
stack is not really part of the stack of your language. In such a 
language usually you don't use the Java object model too. I think early 
JRuby might be a good example for such a language, but cannot tell for 
sure. Charles may know more about that of course.

Such a language usually has also the problem of wrapping each and any 
object of the language into wrappers or access the objects through 
language specific interfaces. If classic Java classes are generated, 
then usually only to be able to interact with Java itself. Extending of 
classes interfaces in a round trip manner usually is a pain here. Just 
think of overwriting an overloaded method.

Now Groovy is different, because Groovy uses the same stack as Java, the 
same classes and almost the same object model. So of course you can use 
any Java debugger to debug Groovy as well. Since the changes to the 
object model should be understood as "add on", Groovy does not need 
special interfaces to work. So wrappers of any kind or normally not needed.

Wrapper are needed when using Reflection and for primitive types then of 
course. There is of course still the numeric math problem, which is 
bigger than we initially thought. Currently the operand stack in a 
method will contain only objects, including boxed integers for example. 
Groovy 1.8 might change it, if it proofs to be worth the try.

> ---------------------------------------------Problem1---------------------------------------------
> P1) Whole program type inference to allow use of jvm primitives for 
> their numeric types and math.
>            (Even the use of Strings and other types we tend to be 
> a HolderForATypicalJREClass)
 >
> Some strategies used:    (So far the compilers that get closest to 
> optimal have done the hard ones like S5 and S1,) 
>  
> S1) Some people had to "force themselves" to use java primitives.  And 
> when they could not get away with it, at least used
>  some  JRE version of a java.lang.PrimitiveHolder.  Stayed frustrated 
> for a few hours but finally conceded that this was no longer their problem.

in Groovy int and Integer are almost aliases. The pain is for the 
language writers then ;) In fact if you declare a local variable of type 
int, then Groovy will store there an object of type Integer. Tat is not 
100% native.

> S2) Some (like myself) used  my.lang.PrimitiveHolder. Consoling myself 
> with "java did it!" or "when its time, *eventually* will fix this to use 
> primitives" or "what knucklehead thought 16 bits is enough to represent 
> any character?" or  "they forgot unsigned byte.. I need a special holder 
> that helps mark this as so"

I think you mean numeric types differing from what Java provides. his is 
a decision for your language.. only what if you have to interface with 
Java and your type system does not contain the types Java knows? You 
will ave to convert all the time, maybe even loosing information in the 
process.

> S3) Create a couple generic HolderForJavaObject and optimize the use of 
> reflection 
>            or make HolderForWidelyUsedJREInterface(s)

in my experience this does not really pay out. Especially if such 
holders will contain generic logic and cause the creation of a 
megamorphic call site.

> S4) Guess what the user will likely use based on the types from MyLang:
>        MyLangJavaNetSocket, MyLangJavaReader, MyLangJavaWriter, etc

Well, in Groovy this is absolutely out of question. We think this kind 
of interfacing is just butt ugly. If you can hide it completely, then it 
is ok, but if the user must use those, then, no.. and for Groovy we see 
here not only the Groovy side, but the Java side as well. One design 
goal for Groovy is to make interfacing Groovy/Java as seamless as 
possible in both directions.

> S5)  Or enforce strong typing and never cheat using a 
> HolderForATypicalJREClass... force my compiler to invoke-virtual 

you can use JVM objects and not use strong typing on them? how does this 
work.. ah well, maybe my definition of strong typing is different. For 
me it means you cannot change the type of an object to something else 
without creating a new one. In case of OO language this is usually 
softened that type changes are allowed if it is to an implemented 
interfaces or parent class. On the static side it is the downcast which 
softens the system, but still requires the runtime check. Anyway, S5 
kind of implies you are not really using the Java object model.

[...]
> * Know I've missed a few (We could try on this list to keep enumerating 
> them)

the basic rule for the JVM is I think that if you want not to have any 
problems, then your language needs to be 100% compatible with Java. 
Groovy is not, so at some small points we have problems. Scala is/was 
not, I don't the current state. Still, if you are 100% compatible with 
an language as huge as Java, then tis implies so many things, that you 
have hardly anything in  your language you can make different.

> ---------------------------------------------Problem2---------------------------------------------
> P2) Excluding P1, if some user would have just written 
> their application in .java in many cases,
>           The bytecode that would have resulted from their code would be 
> a tad better than ours.

it is not really the bytecode that is better, it is the hotspot 
compiler, that knows the patterns the java compiler emits better than 
other patterns. And this is really important only if you are really 
creating patterns hotspot cannot handle. I think all interpreted 
languages are excluded here already. I think this point mostly targets 
performance... but there are so many other places where you can easily 
loose performance too. Optimizing a language runtime for the JVM is a 
tough job.

> S8) invoke static 
> to MyLangLibrary.tooWierdOfFeatureToCompileSmartly(MyLangASTLikeObject 
> obj) {... written in java ..}

what would that be? I found that sometimes I want to do tings the 
bytecode allows, but not Java as language. In that case it would not 
help me. And in other cases I can emit the bytecode I need myself too.. 
Also this kind of implies having a runtime AST of some kind. In that 
case you are on a good way to make an interpreter, including all the 
stack information problems.

> S9) Make the user just write java in a new featurefull syntax

I think surprisingly many want actually this.

> S10) inline S8 into bytecode  (Not a strategy: But allot of people think 
> it is.. profiling might tell otherwise.. ("method too large to JIT"))

Which reminds me.. is there an easy way to tell a method is too large to 
JIT?

> S11) have 2nd-4th passes that optimize their intermediate proposals 
> before bytecode representation is emitted.

that usually requires a compiled language, probably with a long running 
compiler. But since the optimizations are done mostly by hotspot I see 
not much point in doing that. Often enough it happened that an old 
optimization strategy turned out to be slowing down, because a new 
generation of hotspot learned to optimize the original construct.

> S12)  Some may even run bytecode optimizations like SOOT 
>       One JVM lang even had to write their own post process: GJIT 
> (http://code.google.com/p/gjit/), 

"even had to" is not really right. Groovy works perfectly without it. 
The target of this is to make Groovy faster. If you think of the 
bytecode as control code for an interpreter, then GJIT is there to emit 
optimized interpreter code by removing some checks, boxing, method 
selection and all that. This can easily turn out into debugging hell if 
the bytecode the optimizer emits has an error. That is because this 
bytecode exists only at runtime and you cannot even take a look at it. 
Calling the java compiler here is just too slow.

Still we probably would have made it the standard for Groovy if it were 
not for the problem that you need a second VM to start the agent and the 
security model may forbid the attachment of that agent to the other VM. 
Imagine for example trying this on Google App Engine.

since we want to be able to make frameworks written in Groovy, but used 
from Java we have to have a Groovy that can be used as kind of library. 
Starting a second VM for this is no option for that. And then Groovy 
would be fast from the command line, but slow as library? We don't want 
that. Not to mention that we may have to handle multiple parallel Groovy 
in different versions at some point in the future.

For almost the same reason bytecode weaving a normal class loader does 
not work. The class might be loader through a loader I don't control, 
ten there is nothing I can do.

[...]
> ---------------------------------------------Problem3---------------------------------------------
> P3)   Maintaining and improving our compilers to solve P1 & P2 better.
>  
>  
> S13) Improve the built in functions of our runtime
>  
> S14)  Use JAVAP/JAD/JODE on the emitted output.    (I think we of are 
> constantly doing this.. but...)

I don't use JAVAP/JAD/JODE for that, since there is no clear 
transformation from Groovy generated bytecode to Java

> !!!!Now the reason for my Email!!!!
>  
> S15)  Create a .Java emitter branch
>  
>   Instead of emitting bytecode .. make compilers emit java code that 
> would have produced the same bytecode
>    (jvm language permitting/omitting optimizations = Can some on list 
> please enumerate what they cannot translate to .java?)
>       
>     
>           - Compile an entire user program into a tree of .java files
>           - Decide if that was how you would have written their program 
> in .java
>                 If not, see how many simplistic changes you could make 
> that could be done at compiler/translator level
>           - Get an overview of how many changes you'd like to make (like 
> S1 thru S9) at the compiler but would be just be too grueling         
>           - Could you write an AST transform/refactor tool 
> targeting this .java source to improve it? 
>                If so, can this be incorporated into the compiler as a pass?
>           - Now that their program is a .java program.. Include your 
> Runtime and make a java project out of it.
>              since you are a java hacker, you can experiment and profile 
> what are good and bad ideas better. to find what optimizations are worth it
>              With that information .. re-include it into the 
> emitter/compiler.
>  
> Pretty much IMO most complier writers that try to do all the above in 
> typical S14 and profiling and have even reached best-cases quite often ..
> Most jvm languages go directly from source to bytecode: 
>   We want to get to execution quickly as possible or Can advertise as a 
> real compiler. or Makes the user feel more secure with their proprietary 
> code
>  
> But regardless feel they never reached "Whole program" efficiency I 
> think it is because the limited scope of S14

I think you would need an language that is semantically almost exactly 
as Java and allows a transformation to Java source without having to add 
information and without loosing information. How to make a Groovy style 
method call from Java? Just by using runtime code may ignore for example 
runtime bytecode generation.

> I suspect the more time one spends in the java-emitter branch of their 
> jvm language (dealing with entire user programs),
>    the better their compiler trunk will become.

I of course sometimes write a construct in Java compile it and then use 
equal bytecode for my compiler to emit. But the Java compiler is not 
really an optimizing compiler. So there are no iteration steps to be 
done to further optimize anything.

> Here is an example of a complier-as-emitter.. An entire user's program 
> was translated from SubLisp to .java
>  
> line 117 of
> http://larkc.svn.sourceforge.net/viewvc/larkc/trunk/platform/src/com/cyc/cycjava/cycl/constant_reader.java?revision=254&view=markup
>  
> <http://larkc.svn.sourceforge.net/viewvc/larkc/trunk/platform/src/com/cyc/cycjava/cycl/constant_reader.java?revision=254&view=markup>
>  
> It uses a for/next look on a boxed Fixnum. but if that was fixed.. line 
> 118 which only consumes Fixnums has to be updated as well.
> And so the cycle begins, replacing fixnums with primitive 'long's (or 
> maybe 'int's if array index accesses will be found in the call tree).
> This code could be manipulated with some heavy java AST manipulation 
> tools. Same way bytecode could have been..
>  
> I just bring all this up in case people are looking for new ideas and 
> might find sometimes easier to work with java in their compiler 
> improvement workflows.
>  
> Myself, once I got the .java down that I thought best its just as easy 
> to compile and then get the best bytecode version for a compiler.

well in Groovy

for ( def i in x) {
   println i
}

means something like:

for (Iterator it = x.><iterator(); it.hasNext();) {
   Object i = it.next()
   this.><println(i)
}

I used >< to mark Groovy method calls, which are not Java method calls. 
In Groovy for example iterator() and various println methods are defined 
on Object already. I could have written:

for (Object i : runtime.getIterable(x)) {
   this.><println(i)
}

but this will require a runtime method that was not needed before and a 
wrapper that was not needed before.

bye blackdrag

-- 
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

--

You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en.

Re: [jvm-l] Why not move thru .java before bytecode?

Reply via email to