John Wilson wrote:
> I'm looking at a way of minimising the classes generated for closures
> and I'm thinking of compiling the closure body as a synthetic static
> method in the enclosing class. The Closure object would then be an
> instance of a generic closure class which dispatches to the static
> method via reflection.

This is how JRuby compiled closures. I will do a review of the JRuby 
compiler design below.

> It occurs to me that this could be generalised to allow the generation
> of lightweight method objects. If we had a way of dynamically adding
> static methods to some utility class returning an instance of
> java.reflect.method then these cold be used as lightweigth method
> objects.
> 
> Imagine a class java.util.DynaHome with a single method Method
> makeMethod(byte[]). Calling that method with some bytecode would add
> the static method to java.util.DynaHome. It would have an arbitrary
> unique name and an instance of method would be returned which allows
> the method to be called. When the instance of metod is GCd the method
> is removed from java.util.DynaHome.
> 
> I have absolutly no idea how feasible this is but I think it, or
> soemthing like it would be pretty useful.

I suppose it would be just fine if it were possible to add methods to 
anything at all. Lacking that...

So, JRuby compiler design 101.

JRuby compiles Ruby code to Java bytecode. Once complete, there's no 
interpretation done, except for eval calls. evaluated code never gets 
compiled; however, if the eval defines a method that's called enough, it 
will also eventually get JIT compiled to bytecode. JRuby is a mixed-mode 
engine.

Given a single input .rb file, JRuby produces a single output .class 
file. This was a key design goal I wanted for the compiler; other 
languages (including Groovy) and other Ruby implementations (including 
XRuby) produce numerous classes from an input file; in some cases, 
dozens and dozens of classes if the input file is very large and 
complex. JRuby produces one .class file.

JRuby compiles from the same AST it interprets from. There is a first 
pass over the AST before compilation to determine certain runtime 
characteristics:

- does a method have closures in it?
- does a method have calls to eval or other scope and frame-aware methods?
- does a method have class definitions in it?
- does a method define other methods?
- .... and so on

Based on this pass, we determine scoping characteristics of all code in 
the method, selectively choosing pure heap-based variables or pure 
stack-based variables. Only methods and leaf closures without eval, 
closures, etc can use normal stack-based local variables. Performance is 
significantly faster with stack variables.

The resulting class file from JRuby contains at a minimum methods to start:

- a normal main() method for running from the command line (grabs a 
default JRuby runtime and launches itself)
- a load() instance method that represents a normal top-level loading of 
the script into a runtime. This performs pre/post script setup and teardown.
- a run() instance method that represents a bare execution of the 
script's contents. This is used by the JIT, where setup/teardown is 
handled outside the JITed code on a method-by-method basis
- a __file__() method that represents the body of the script. This is 
where script execution eventually starts.

Then, depending on the contents of the file, additional methods are added:

- normal method definition bodies become Java methods
- class/module bodies become Java methods
- closure bodies become Java methods
- rescue/ensure bodies become synthetic methods
- if the normal top-level script method is too long, it's split every 
500 top-level syntactic elements and chained (we did run into one large 
flat file that broke the method size limit). We do not yet perform 
chaining on normal method bodies, because we have not encountered any 
that are too large.

Of these, only class bodies, rescue/ensure bodies, and chained top-level 
script methods get directly invoked during script execution. The others 
are bound into the MOP at runtime.

Binding occurs in one of two ways:

- by generating a small stub class that implements DynamicMethod and 
invokes the target method on the target script directly
- by doing the same with reflection (broken now due to lack of use; will 
be fixed for 1.1)

In our testing, generating stub "invoker" classes has always been faster 
than reflection, especially on older JVMs. For the time being, that's 
the preferred way to bind methods, but I'm going to get reflection-based 
binding working again for limited/restricted environments like applets. 
With reflection-based binding and pre-compiled Ruby code with no evals, 
JIT compilation could be completely turned off and no classes would ever 
be generated in memory by JRuby.

So then here's a walkthrough of a simple script:

# we enter into the script body in the __file__ method
# require would first look for .rb files, then try to load .class
require 'foo'

# normal code in the method body
puts 'here we go'

# upon encountering a method def, a new method is started in the class
def bar
   # this is a simple method body, and would use stack-based vars
   puts 'hello'
end
# once the method has been compiled, binding code is added to __file__

# class definitions become methods as well, building the class
class MyClass
   # this is code in the body of the class
   puts 'here'

   # a method in the class is compiled like any other method body
   def something(a, b = 2, *c, &block)
     # this method has all four param types:
     # normal, optional, "rest" or varargs, and block argument
     # the compiler generates code to assign these from an incoming
     # IRubyObject[]

     # this method has a closure, so it would use heap-based vars
     # ... but the closure would use stack vars, since it's a simple leaf
     1.times { puts 'in closure' }
   end
   # method is completed, bound into the class we're building
end
# end of class definition; __file__ code invokes the class body directly

# any begin block or method body with a rescue/ensure attached will
# be compiled as a synthetic method. This also necessarily means that
# method bodies containing rescue/ensure must be heap-based.
begin
   puts 'rescue me'
rescue
   puts 'rescued!'
ensure
   puts 'ensured!'
end

A sample run of the JRuby compiler:

~/NetBeansProjects/jruby $ jruby sample_script.rb
here we go
here
rescue me
ensured!

~/NetBeansProjects/jruby $ jrubyc sample_script.rb
Compiling file "sample_script.rb" as class "sample_script"

~/NetBeansProjects/jruby $ ls -l sample_script.*
-rw-r--r--   1 headius  headius  8396 Oct  4 09:38 sample_script.class
-rw-r--r--   1 headius  headius  1449 Oct  4 09:38 sample_script.rb

~/NetBeansProjects/jruby $ export 
CLASSPATH=lib/jruby.jar:lib/asm-3.0.jar:lib/jna.jar:.

~/NetBeansProjects/jruby $ java sample_script
here we go
here
rescue me
ensured!

The resulting .class file is attached for your enjoyment!

Shall I continue? I can discuss the inline cache, the call adapters we 
generate for dynamic dispatch, the fast switch-based dispatcher, how the 
JIT and interpreter work together, or any other details anyone would like.

- Charlie

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en
-~----------~----~----~----~------~----~------~--~---

Êþº¾



getRuntime












here we go











in closure







NULL_ARRAY
NULL_BLOCK

__ensure_1
__rescue_1
4C

LO












getTopSelf™


SourceFile













-
5
<
b
i
r
y



*+,-¶…°

Reply via email to