On Sat, Jul 25, 2009 at 11:38 PM, Yehuda Katz<wyc...@gmail.com> wrote:
> On Sat, Jul 25, 2009 at 7:26 PM, Subramanya Sastry <sss.li...@gmail.com>
> wrote:
>> 1. Open classes: This is the well known case where you can modify classes,
>> add methods, redefine methods at runtime.  This effectively means that
>> method calls cannot be resolved at compile time.  The optimization for
>> improved performance is to optimistically assume closed classes but then
>> have solid mechanisms to back out in some way (either compile-time guards or
>> run-time invalidation detection & invalidation).

Correct.

> Add the ability to include modules at runtime, which has peril but promise.
> Modules get inserted in the hierarchy of a class, which means that they
> effectively become a new class. However, you can add modules directly onto
> any object at runtime (just as you can add methods directly onto a single
> object). This means that simple class caching can't work, since an object
> can have different methods than its class. However, in the case of modules,
> it is hypothetically possible to create shadow classes that represent a
> class + specific collections of modules.

This is no worse than arbitrary open classes; it is just a larger unit
of work for class modification. Currently in JRuby, our caching logic
depends on a class token, which all of class mutation, module
inclusion, and included module mutation invalidate. Outside of the
cost of doing the invalidation, they are all equivalent impact to
inline caching or per-class caching.

>> 2. Duck typing: This is also the well known case where you need not have
>> fixed types for method arguments as long as the argument objects can respond
>> to a message (method call) and meet the message contract at the time of the
>> invocation (this could include meeting the contract via dynamic code
>> generation via method_missing?).  This means that you cannot statically bind
>> method names to static methods.  The optimization for improved performance
>> is to rely on profiling and inline caching.

Correct. This is, oddly enough, the least worrisome of all Ruby's
characteristics.

> AKA polymorphic dispatch. In Ruby, it is hypothetically possible to
> determine certain details at compile time (for instance, methods called on
> object literals). In general though, the idea of determining before runtime

...as long as we assume *frozen* core class literals...

> what method will be called is a fool's errand--there are simply too many
> commonly used features that can change these semantics. However--as I have
> pointed out to Charlie a number of times--in practice, classes are basically
> frozen after *some* time. In Rails, pretty much all classes reach their
> final stage at the end of the bootup phase. However, since JRuby only sees a

This is true. All systems of any reasonable maturity settle into an
"effectively frozen" set of classes. The new JavaScript VMs are also
predicated on this assumption.

> parse phase and then a generic "runtime" it's not possible for it to
> determine when that has happened. I personally would be willing to give a
> guarantee to Ruby that all classes are in a final state. This is actually
> possible in Ruby right now via:
> ObjectSpace.each_object(Class) {|klass| klass.freeze}
> ObjectSpace.each_object(Module) {|mod| mod.freeze}
> It should be possible to completely eliminate the method cache check in
> JRuby for frozen classes (if all of their superclasses are also frozen), and
> treat all method calls as entirely static. An interesting side-note is that
> most methods are JITed only *after* the boot phase is done, and it should
> also be possible to have a mode that only JITed frozen classes (to apply
> some more aggressive optimizations).

And as I've told Yehuda before (but restate here for the benefit of
the reader) this is a totally acceptable optimization.

JRuby's new/upcoming "become_java!" support is effectively freezing a
class for Java purposes. Doing a similar freeze for optimization
purposes is certainly valid...

...but it's also kind of gross. You shouldn't have to explicitly say
"be fast" to make code fast, and we need to consider optimizations as
though nobody will ever call "be_fast" or pass "--fast". The default
settings should be optimized as much as possible, and we should
consider using language-level (not API-level) features to improve that
situation (like optional static typing if you *really* need
machine-level numerics).

>> 3. Closures: This is where you can create code blocks, store them, pass
>> them around, and invoke them.  Supporting this requires allocating heap
>> frames that captures the ennvironment and keeps it around for later.  The
>> optimization for improved performance includes (a) lazy frame allocation
>> (on-demand on call paths where closures are encountered) (b) only allocating
>> frame space for variables that might be accessed later (in some cases, this
>> means all variables) (c) inlining the target method and the closure and
>> eliminating the closure altogether [ using a technique in one of my early ir
>> emails ] (d) special case optimizations like the cases charlie and yehuda
>> have identified.

Correct. (c) is perhaps the most interesting to me for a localize
optimization, and (a) + (b) are most interesting for general
optimization. (d) will be useful once we really have the appropriate
visibility and metadata we need to do those optimizations.

> There are some additional closure perils. For one, once you have captured a
> block, you can eval a String into the block, which gives you access to the
> entire closure scope, including variables that are not used in the closure.
> As Charlie pointed out earlier, however, this can only happen if you
> actually capture the block in Ruby code. Otherwise, this behavior is not
> possible.

In the general, stupid case, the presence of a block is exactly as
damaging as the presence of a call to eval or binding, and that's how
the current compiler treats it. But in specific cases, where we can
statically or dynamically gather more information about the intended
use of a block, we can reduce the impact of a closure. If we can
determine it's passed to a "known safe" core method we can apply (c)
above, manually inlining the logic of the block directly into the
caller and never constructing a closure. If we can determine it's
passed to a method that doesn't do anything with the block other than
'yield', we can construct a lighter-weight, lower-impact closure. And
the remaining cases are <5%, so I don't really care...full deopt is
acceptable in the near term.

> You can also do things like:
> def my_method
>   [1,2,3].each { yield }
> end
> which yields the block passed into my_method, and

yield is always a *static* call to the frame's block. It's not as
problematic as it looks, or at least it's no more problematic than
other frame-local data a closure must have access to.

> def my_method
>   [1,2,3].each {|x| return if x == 2 }
> end

Non-local returns are also not as problematic as you would expect, and
mostly just incur additional bytecode costs. The current dispatch
protocol has separate paths for "with literal block" and "no block"
that handle non-local return behavior. It's a problem, but not a
serious one. And the "jump target" of a non-local return is once again
just a frame-local value.

> which returns from my_method. You can also alter the "self" of a block,
> while maintaining its closure, which should not have any major performance
> implications.

Which is a rare, but present case.

>> 4. Dynamic dispatch: This is where you use "send" to send method
>> messages.  You can get improved performance by profiling and inline caching
>> techniques.
>
> The most common use of send is send(:literal_symbol). This is used to get
> around visibility restrictions. If it was possible to determine that send
> was actually send (and not, for instance, redefined on the object), you
> could treat send with a literal Symbol or String as a literal method
> invocation without visibility checks. It would be possible to apply this
> optimization to frozen classes, for instance. I also discussed doing a full
> bytecode flush whenever people do stupid and very unusual things (like
> aliasing a method that generates backrefs, or overriding eval or send).

Yehuda is probably right here. As we go down the list of potential
"send" usages, we see decreasing commonality. Eventually the weirdest
cases, of using send to call eval or aliasing send to something else,
essentially never happen. I think we can make a lot of assumptions
about 'send' and optimize for the 99% case without impacting anyone,
And if we want to be 100% safe, we can make that last 1% be hard error
cases, and tell people "pass --slow if you really intend to do this".

>> 5. Dynamic code gen: This is the various forms of eval.  This means that
>> eval calls are hard boundaries for optimization since they can modify the
>> execution context of the currently executing code.  There is no clear way I
>> can think of at this time of getting around the performance penalties
>> associated with it.  But, I can imagine special case optimizations including
>> analyzing the target string, where it is known, and where the binding
>> context is local.
>
> This is extremely common, but mainly using the class_eval and instance_eval
> forms. These forms are EXACTLY equivalent to simply parsing and executing
> the code in the class or instance context. For instance:
> class Yehuda
> end
> Yehuda.class_eval <<-RUBY
>   def omg
>     "OMG"
>   end
> RUBY
>
> is exactly equivalent to:
> class Yehuda
>   def omg
>     "OMG"
>   end
> end
> As a result, I don't see why there are any special performance implications
> associated. There is the one-time cost of calculating the String, but then
> it should be identical to evaluating the code when requiring a file.

The performance implications come from the potential that you might
eval something *later* and we don't see it in early profiles:

def foo(call_count)
  if (call_count < 10000)
    eval "horrible nasty code"
  else
    nice friendly code
  end
end

It's the fact that eval is *arbitrarily* late to the party that
complicates things. Other code enters the party at a precise moment.

>> 6. Dynamic/Late binding: This is where the execution context comes from an
>> explicit binding argument (proc, binding, closure).  This is something I was
>> not aware of till recently.
>
> This is only present when using eval, and it would be absolutely acceptable
> to make this path significantly slower if it meant any noticable improvement
> in the rest of the system.

The trick is how to make it lazily slower without impacting all code
that does not do this. I do not have an answer for this if we can't do
OSR.

> The truth is that send itself is rather uncommon, and when it occurs it is
> almost always with a Symbol or String literal. If you just did a pure deopt
> in the case of send with a dynamic target, you'd get a lot of perf in MOST
> cases, and the same exact perf in a few cases. Sounds like a win to me.

Again true, at least as far as code I have explored. #send is usually
called with a literal, and the current call protocols optimize that.
But the fact that send has other cases does limit our optimization
potential--maybe nearly as much as eval--and without OSR we have very
limited options.

> Here's an example of an actual use-case in Rails:
...
> This may seem insane at first glance, but there are a number of mitigating
> factors that make this easy to optimize:
>
> The eval happens once. This method simply provides parse-time declarative
> features to Rails controllers. You can think of helper_method as a
> parse-time macro that is expanded when the class is evaluated.
> The send actually isn't dynamic at all. If you call helper_method(:foo),
> that send gets expanded to: controller.send(%(foo), *args, &blk), which is a
> String literal and can be compiled into a method call without visibility
> check.

This is also true. Although an inaccurate profile in the long term may
spell DOOM for JRuby, if defined clearly and tested well it can be
mitigated. Most such calls are made very early in execution, and are
rarely ever made again. No performant framework can afford to be
evaluating new code at arbitrary times in the future.

I think we still need to consider OSR techniques in a JVM-friendly
way, but failure in 1% of cases may actually be an acceptable
situation, if users are satisfied that their long-term behavioral
needs are going to be met.

>> This is a contrived example, but basically this means you have to keep
>> around frames for long times till they are GCed.  In this case
>> delayed_eval_procs keeps around a live ref to the 20 frames created by foo
>> and bar.
>
> However, the only case where you care about the backref information in
> frames (for instance), means that you only care about the LAST backref that
> is generated, which means that you only need one slot. Are you thinking
> otherwise? If so, why?

Different frame fields have different lifetimes, We need to formalize
those lifetimes, and I suspect a number of them will be happily
encompassed by a single thread-local "out" variable. Some will not.

>> While the examples here are contrived, since there is no way to "ban" them
>> from ruby, the compilation strategies have to be robust enough to be
>> correct.
>
> Considering that they're so rare, it's ok to do extreme deopts to take care
> of them.

And I am not opposed to banning them in an opt-in way.

>> I haven't invested aliasing yet ... but, I suspect they introduce further
>> challenges.
>
> I think that aliasing dangerous methods happens so rarely that flushing all
> of the bytecode in that case is an acceptable deopt.

It is incredibly rare, to the point of being nonexistent. JRuby
essentially had a *hard* failure case if you ever aliased 'eval', and
nobody reported problems for two years, despite many production Rails
deployments.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to