Alan McKean wrote:
Since the lack of Java serialization of JRuby objects stops us dead in
our tracks when trying to hook up our persistence engine, I am
interested in either getting someone on this end to work on it or
jumping in myself. In either case, I need some background on the JRuby
runtime architecture and some guidance on particular issues. The issues
are about how to detach an object from its runtime elements and how to
restore them when the object gets reloaded into memory:
1) When we first tried saving a JRuby object to our database, we saw it
drag along a gaggle of runtime objects. Given that it might be loaded
into a different VM when it is brought in from the database,
is reconnecting the object to a particular runtime important? If so, is
there a way of determining which of the available runtimes would be best
to connect it to?
2) Detaching an object from its 'runtime' variable and making the
'metaclass' variable transient lets us store the object in our database
without dragging much else along. But we need to reconnect things when
the object is reloaded into memory. Is there a canonical name for the
metaclass that we could store in the database along with the instance?
If not, what information is available for reconnecting. iWe persist type
information in our Java product by storing the fully-qualified name of
the class with the object, then lazily loading and initializing the
connection (using the name) when we reload the object to memory. Will
this work in JRuby?
If someone has thought through a strategy for deserializing a JRuby
object and restoring its connections to its runtime, I would love to
hear about it.
I've been looking into this a bit tonight. This email represents me
rambling.
Marking metaclass and finalizer as transient are no-brainers. I'm going
to go ahead and commit that.
I'm going to see what would be needed to remove getRuntime everywhere
it's needed. It would be a big job...be back in a few minutes...
...ok I'm back. I think it's doable. Here's more rambling thoughts.
The runtime connection is used for a few things:
1. to construct other objects
This is mostly a self-fulfilling prophecy. Objects require a runtime
when they're created, so all objects need runtime available to create
objects. If we break that chain, a number of places that depend on
runtime disappear.
2. to locate classes in order to construct objects
This is a little harder to eliminate. In order to construct a Ruby
"String" object, you need to have access to the "String" metaclass. That
means having access to the place where the "String" metaclass is stored,
currently in the runtime. Again, this is largely self-fulfilling; you
need access to a metaclass to construct an object, so you need to locate
the metaclass, and since the metaclasses are currently rooted in the
runtime, you need the runtime. But the runtime dependency is largely
peripheral to the use case.
3. to access runtime-global and thread-local data at execution time
This is probably the hardest to eliminate. Every thread Ruby code
creates or encounters is associated with a ThreadContext, which contains
extra thread-local state needed for executing Ruby code. Every external
thread that touches a given runtime is "adopted" and given a
ThreadContext and a Ruby "Thread" avatar to represent it. So a given
Java thread may have many ruby "Thread" and "ThreadContext" associated
with it, one per runtime it has touched. This allows us to share threads
across runtimes, rather than having a given thread bound to a given
runtime execlusively, as in many other JVM languages. But it also
requires that we locate the runtime, and therefore the ThreadContext, in
a different way. Therefore, we have the runtime dependency.
This essentially sums up all the major reasons why we have so many
dependencies in code on access to a runtime object. And ultimately,
requiring access to a runtime object obliterates the possibility of
third-party manipulation and transport of Ruby objects.
So to summarize, the three actual reasons we depend on runtime being
present are as follows:
1. to access and maintain types associated with a specific ruby worldspace
2. to access and maintain state associated with a specific ruby worldspace
3. to provide execution state and primitives for code running in a
specific ruby worldspace
Now let's rewrite the list by substituting in a different concept for
our top-level ruby worldspace:
1. to access and maintain types associated with a specific ClassLoader
2. to access and maintain static associated with a specific ClassLoader
3. to provide execution state and primitives for code running in a
specific ClassLoader
So let's examine how we'd solve these issues.
First off, IRubyObject.getRuntime(). Let's assume that the classloader
that loads the Ruby class is our chosen, ultimate worldspace:
public Ruby getRuntime() {
JRubyClassLoader cl = (JRubyClassLoader)Ruby.class.getClassLoader();
cl.getRuntime();
}
Everything else largely falls out of this. Starting up a new instance of
JRuby largely becomes the act of constructing the top-level classloader
in which it will live and telling it to "go".
JRubyClassLoader cl = new JRubyClassLoader(..., properties);
cl.evalScript("puts 'hello'", "(eval)");
Everything lives underneath the classloader, and since all classes have
access to that classloader, all code can retrieve the runtime associated
with it.
Would something like this work? The view from inside the classloader
seems pretty reasonable...we already have this root context and
partitioning as part of Java's classloader support, and it seems fairly
natural to use it. But I'm not well-enough versed in Java serialization
to know if this will solve our deserialization issues. It may require
you to have more control over the object stream...but of course if you
have control over the object stream, you could also just have it ask a
specific runtime to unmarshal objects, avoiding the issue completely.
Thoughts? More ideas?
- Charlie
---------------------------------------------------------------------
To unsubscribe from this list please visit:
http://xircles.codehaus.org/manage_email