John Wilson wrote:
> I'm looking at a way of minimising the classes generated for closures
> and I'm thinking of compiling the closure body as a synthetic static
> method in the enclosing class. The Closure object would then be an
> instance of a generic closure class which dispatches to the static
> method via reflection.
This is how JRuby compiled closures. I will do a review of the JRuby
compiler design below.
> It occurs to me that this could be generalised to allow the generation
> of lightweight method objects. If we had a way of dynamically adding
> static methods to some utility class returning an instance of
> java.reflect.method then these cold be used as lightweigth method
> objects.
>
> Imagine a class java.util.DynaHome with a single method Method
> makeMethod(byte[]). Calling that method with some bytecode would add
> the static method to java.util.DynaHome. It would have an arbitrary
> unique name and an instance of method would be returned which allows
> the method to be called. When the instance of metod is GCd the method
> is removed from java.util.DynaHome.
>
> I have absolutly no idea how feasible this is but I think it, or
> soemthing like it would be pretty useful.
I suppose it would be just fine if it were possible to add methods to
anything at all. Lacking that...
So, JRuby compiler design 101.
JRuby compiles Ruby code to Java bytecode. Once complete, there's no
interpretation done, except for eval calls. evaluated code never gets
compiled; however, if the eval defines a method that's called enough, it
will also eventually get JIT compiled to bytecode. JRuby is a mixed-mode
engine.
Given a single input .rb file, JRuby produces a single output .class
file. This was a key design goal I wanted for the compiler; other
languages (including Groovy) and other Ruby implementations (including
XRuby) produce numerous classes from an input file; in some cases,
dozens and dozens of classes if the input file is very large and
complex. JRuby produces one .class file.
JRuby compiles from the same AST it interprets from. There is a first
pass over the AST before compilation to determine certain runtime
characteristics:
- does a method have closures in it?
- does a method have calls to eval or other scope and frame-aware methods?
- does a method have class definitions in it?
- does a method define other methods?
- .... and so on
Based on this pass, we determine scoping characteristics of all code in
the method, selectively choosing pure heap-based variables or pure
stack-based variables. Only methods and leaf closures without eval,
closures, etc can use normal stack-based local variables. Performance is
significantly faster with stack variables.
The resulting class file from JRuby contains at a minimum methods to start:
- a normal main() method for running from the command line (grabs a
default JRuby runtime and launches itself)
- a load() instance method that represents a normal top-level loading of
the script into a runtime. This performs pre/post script setup and teardown.
- a run() instance method that represents a bare execution of the
script's contents. This is used by the JIT, where setup/teardown is
handled outside the JITed code on a method-by-method basis
- a __file__() method that represents the body of the script. This is
where script execution eventually starts.
Then, depending on the contents of the file, additional methods are added:
- normal method definition bodies become Java methods
- class/module bodies become Java methods
- closure bodies become Java methods
- rescue/ensure bodies become synthetic methods
- if the normal top-level script method is too long, it's split every
500 top-level syntactic elements and chained (we did run into one large
flat file that broke the method size limit). We do not yet perform
chaining on normal method bodies, because we have not encountered any
that are too large.
Of these, only class bodies, rescue/ensure bodies, and chained top-level
script methods get directly invoked during script execution. The others
are bound into the MOP at runtime.
Binding occurs in one of two ways:
- by generating a small stub class that implements DynamicMethod and
invokes the target method on the target script directly
- by doing the same with reflection (broken now due to lack of use; will
be fixed for 1.1)
In our testing, generating stub "invoker" classes has always been faster
than reflection, especially on older JVMs. For the time being, that's
the preferred way to bind methods, but I'm going to get reflection-based
binding working again for limited/restricted environments like applets.
With reflection-based binding and pre-compiled Ruby code with no evals,
JIT compilation could be completely turned off and no classes would ever
be generated in memory by JRuby.
So then here's a walkthrough of a simple script:
# we enter into the script body in the __file__ method
# require would first look for .rb files, then try to load .class
require 'foo'
# normal code in the method body
puts 'here we go'
# upon encountering a method def, a new method is started in the class
def bar
# this is a simple method body, and would use stack-based vars
puts 'hello'
end
# once the method has been compiled, binding code is added to __file__
# class definitions become methods as well, building the class
class MyClass
# this is code in the body of the class
puts 'here'
# a method in the class is compiled like any other method body
def something(a, b = 2, *c, &block)
# this method has all four param types:
# normal, optional, "rest" or varargs, and block argument
# the compiler generates code to assign these from an incoming
# IRubyObject[]
# this method has a closure, so it would use heap-based vars
# ... but the closure would use stack vars, since it's a simple leaf
1.times { puts 'in closure' }
end
# method is completed, bound into the class we're building
end
# end of class definition; __file__ code invokes the class body directly
# any begin block or method body with a rescue/ensure attached will
# be compiled as a synthetic method. This also necessarily means that
# method bodies containing rescue/ensure must be heap-based.
begin
puts 'rescue me'
rescue
puts 'rescued!'
ensure
puts 'ensured!'
end
A sample run of the JRuby compiler:
~/NetBeansProjects/jruby $ jruby sample_script.rb
here we go
here
rescue me
ensured!
~/NetBeansProjects/jruby $ jrubyc sample_script.rb
Compiling file "sample_script.rb" as class "sample_script"
~/NetBeansProjects/jruby $ ls -l sample_script.*
-rw-r--r-- 1 headius headius 8396 Oct 4 09:38 sample_script.class
-rw-r--r-- 1 headius headius 1449 Oct 4 09:38 sample_script.rb
~/NetBeansProjects/jruby $ export
CLASSPATH=lib/jruby.jar:lib/asm-3.0.jar:lib/jna.jar:.
~/NetBeansProjects/jruby $ java sample_script
here we go
here
rescue me
ensured!
The resulting .class file is attached for your enjoyment!
Shall I continue? I can discuss the inline cache, the call adapters we
generate for dynamic dispatch, the fast switch-based dispatcher, how the
JIT and interpreter work together, or any other details anyone would like.
- Charlie
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "JVM
Languages" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/jvm-languages?hl=en
-~----------~----~----~----~------~----~------~--~---
Êþº¾
getRuntime
here we go
in closure
NULL_ARRAY
NULL_BLOCK
__ensure_1
__rescue_1
4C
LO
getTopSelf
SourceFile
-
5
<
b
i
r
y
*+,-¶
°