[jvm-l] Re: Dynlang caching strategies

Charles Oliver Nutter Tue, 09 Sep 2008 16:13:24 -0700

Jochen Theodorou wrote:
> We had in Groovy a per class cache for method lookups. But the cache was 
> too slow (the generation of the key did cost too much) and if we have 
> inline caches we don't need that kind of caches. Now I don't what kind 
> of relation your cache uses. I guess you use a (name,argument number) 
> type of key and then react based on the receiver in the general case and 
> maybe a precalculated number in special cases. If that is right, then of 
> course you have to check all caches if a class changes... combined with 
> synchronization this looks like something that could be better if 
> someone had a bright idea


We largely had the same opinion, that once the inline cache was 
installed the per-class cache was unnecessary. But we were wrong! We now 
realize we either need a couple special cache sites or a per-class cache 
to support cases like send (call an arbitrary method by name), 
method_missing (respond to a missing method), and respond_to? (check if 
a target object defines a method of a given name). These three cases see 
no gain from a standard invocation inline cache and as such are forced 
to repeatedly do full-hierarchy searches. This is a large part of our 
motivation to build a version-sensitive per-class cache, since for send 
and method_missing we have megamorphic call sites and for respond_to? we 
need to be fast at *failing* to find a method as well as *succeeding*

>> We keep references to them in a JRuby-runtime-global map, 
>> and when changes happen to the hierarchies from which they came we tell 
>> all sites known to have cached that method to flush themselves. This 
>> works very well so far, but there's a very small chance that under high 
>> concurrent load, if a method is simultaneously being called by one 
>> thread and redefined by another a cache could get stuck with a stale 
>> entry and potentially never flushed. I have not been able to produce 
>> this effect experimentally and it has never been reported, so we have 
>> not made substantial efforts to replace the active flushing logic.
> 
> hehe... we once had a bug report that had its cause in a method object 
> (a custom object wrapping a reflection method object) existing but not 
> being intialized yet... that was a pretty tough one to debug, because of 
> course we where not able to reproduce the problem, because it depends on 
> too many factors that are not equal on the developer machine. NBut maybe 
> class changes do not appear rapidly enough.

In general the cases that would lead to such a bug in JRuby are 
discouraged by Rubyists; specifically, it's frowned upon to be mutating 
world-visible classes in one thread while consuming them in another.

>> An alternative, which I implemented as an experiment, is to instead 
>> calculate the version each time, compositing version information 
>> recursively through the hierarchy. If the per-class cost is low enough, 
>> this does improve performance slightly over a raw search for the method 
>> in all superclasses' method maps. But it doesn't appear to help enough 
>> to warrant the complexity.
> 
> you mean the way I just suggested here? Hmm...

Basically that, yes. The limiting factor was really reducing the cost of 
an "active" version calculation at call time, since the complexity of 
maintaining all class versions "passively" is unrealistic (and difficult 
to reconcile with threading concerns, since you may have many version 
changes happening at the same time across classes and subhierarchies).

> ok let me try to develop a new idea here.... let us say that each time 
> you mutate a class in any way you create a new class an the old class 
> will be flagged as dirty. Let us also say that there is some kind of 
> ClassHandle that stores a reference to the current class. Now a super 
> class may have a listener like structure for each child class. If the 
> super class changes it notifies each child by marking them as dirty and 
> then marks itself as dirty too... I guess this needs to be synchronized. 
> Anyway, the handle is constant and points now to a dirty class. If you 
> do any action involving this class, then you need to create the class 
> first new to get a version that is not dirty. his new version will then 
> again register itself at the parent and update the method handle. In the 
> inline cache you store the tuple receiver class (the goal of the method 
> handle) the method handle, as well as the  handle. Then you would have 
> to check if handle.reference==receiverClass && 
> !handle.reference.isDirty(). If the update did already happen, then that 
> part of the cache is no longer called. I don't know if you have timeouts 
> for your caches or if you keep track which branches has been called to 
> ensure the cache does not blow up... or you change the order to 
> !handle.reference.isDirty() && handle.reference==receiverClass and in 
> case of isDirty you change the cache. method objects do not use the real 
> class, instead they use the handle, meaning that you don't need to 
> recreate them when a class changes. Well I guess that is not really 
> different from the version idea since I now use a reference instead of a 
> number, but it should be interesting to know how this affects the 
> performance. the advantage is that you never need to go through all 
> caches and update them. You also do not need to update classes that are 
> not used. The disadvantage is that you still have synchronization a lot.

More than thread synchronization my fear with any mutating mechanism is 
thread safety. There's a lot of bits and pieces here that would have to 
be synchronized together. There's also the potential cost of code that 
defines methods; for example, since Ruby classes come into being the 
minute they're first "opened" every method definition after that point 
would spin a new class. That would represent a severe startup hit.

However I think I have an idea forming from parts of yours. Part of the 
complexity I ran into with doing an active version calculation at the 
call site was the overhead of recursively walking the superclasses. If 
instead that recursive walking was performed ahead of time and the 
resulting "counters" gathered into a list, it would not require a 
recursive walk and could all take place as a simple iteration. There 
could also be a flag in each counter that indicates a hierarchy change 
(such as a new module being mixed in) which would trigger a new 
gathering of counters throughout the altered hierarchy.

So essentially, parents would not need to notify children of changes 
anymore; they would just notify the smart version counter they contain. 
A new class extending an existing one would aggregate references to 
these counters and use them to perform an active calculation at the call 
site. And since the counters are pre-aggregated, there would be no need 
to walk the tree repeatedly or perform a recursive call over polymorphic 
version calculation implementations.

- Charlie

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to jvm-languages@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en
-~----------~----~----~----~------~----~------~--~---

[jvm-l] Re: Dynlang caching strategies

Reply via email to