Alex Karasulu wrote:
-o- The Big Picture -o-
We have a situation where we need to securely and dynamically load classes that are pulled down as artifact jars from a remote repository. We want to isolate the classes loaded from repository jar artifacts within separate nested ClassLoaders to clearly model API, SPI and implementation artifact dependencies thereby removing the potential for class collisions.
The repository is responsible for pulling down jars into a local cache as they are requested so ClassLoaders can be assembled using those locally cached jar artifacts. But when establishing the repository we need to pull down some artifacts just to start up the repository when one does not exist. This is where we have a chicken and egg problem.
Let's not loose sight of our ultimate goal: to have a small kernel
bootstrapping jar. Take this bootstrap jar, add some water and it should
blossom into a nice kernel by going though a set of bootstrapping stages
that pull down information and artifacts from the repository.
When the stages of bootstrapping are analyzed we realize that the Repository
itself must be used to bootstrap the kernel. The repository must give us
the locally cached artifacts used to assemble the Kernel's ClassLoaders. In
many respects it makes sense to ask the repository to build a ClassLoader
hierarchy on our behalf given any artifact.
I like the idea of moving the ClassLoader construction as a service provided
by the repository. It's nice!
This is where the power to query the repository for POM information becomes critical. When the repository is asked for a jar artifact it should return the ClassLoader for that artifact and not a simple yes no answer. The ClassLoader regardless of the nesting scheme used should provide for all runtime jar dependencies associated with original jar artifact requested.
I think that using a POM as the source of information for a ClassLoader definition is feasible - but I have reservations concerning directly accessing a POM. Basically a POM contains build time and test time dependencies which are not necessarily the same as runtime dependencies. Secondly, runtime dependencies can involve policy decisions that are not expressed in the POM model as it stands today.
However - we can use a POM to generate classloader construction criteria. I've been playing around with jelly to automate the generation of classloader criteria using dependency properties.
For example - the following dependency declarion includes a property named "avalon.classloader" which contains the name of the classloader category into which the dependency should be loaded.
<dependency>
<groupId>avalon-util</groupId>
<artifactId>avalon-util-defaults</artifactId>
<version>1.0-dev</version>
<properties>
<avalon.classloader>impl</avalon.classloader>
</properties>
</dependency>This approach assumes specific property values such as "api", "spi", "impl". Using this information its possible to generate the classloader criteria in the form of a flat properties file which keeps things small in terms of footprint. This process could be packaged into a plugin for convenience. However, there is a disadvantage that we would be duplicating information that exists within the POM.
What implications does this have on the Repository? The repository when given an artifact must determine the chain of dependencies by querying a POM. The repository then uses this information to download the required jars into the local cache in the appropriate structure. The cached jars are used to construct the ClassLoader with the appropriate parents for the requested jar artifact.
The bottom line: the remote repository needs to become more intelligent.
For the time being we can mimic this intelligence by laying out some
descriptor artifacts within the repository and building in the logic within
the client API which effectively mimics a queriable (is that a word?)
repository implementation. If we create a good repository SPI then the fat
repository implementation can be traded in latter for a thinner one that can
talk to the repository in a better language. Yeah I think that's LDAP so
what! ;-)
:-)
Let's watch our expression semantics and be absolutely clear. Bootstrapping
in the general sense for kernels and any other repository dependent
applications occur through the repository. The repository is the general
bootstrapping API.
+1
Now bootstrapping the repository is a completely different endeavor all together however it can benefit from the functionality used to make the repository serve as the general bootstrapping API. The difference between the repository bootstrapping and the general bootstrapping mechanism is the need for some seed information to get the process going. This information can easily be embedded into the repository API or the API jar.
Yep.
For the generic bootstrapping framework to be used by Merlin or any other repository aware application we need to devise some conventions around its use. To think about the conventions we need let's start looking into the use cases for Merlin.
If Merlin is a repository aware application it resides within the repository
and so do its dependent artifacts. The repository stores the project
information for Merlin as well as its dependencies. The dependency tree can
be determined and a nested ClassLoader structure can be assembled using this
information to safely build and run the Merlin kernel. With regard to the
way ClassLoaders work it is best to provide a top level factory interface
for your application and its embedding API. This way anything using the top
level factory automatically creates objects that inherit the factory's
ClassLoader. The factory method design pattern can elegantly be used to
cross ClassLoaders separating the API from the implementation classes.
Using this pattern a Factory interface (repository aware application's
embedding API part) would be used to cross into the implementation
ClassLoader to make calls against the factory implementation (the repository
aware application's embedding implementation part). The implementation
factory then creates concrete implementation products within the
implementation ClassLoader so the concrete product classes are isolated in
the ClassLoader of the implementation factory.
OK so far.
The repository aware application's embedding API must define a special initial factory implementation that acts like a delegate to a factory implementation. This initial factory should expose a constructor that takes arguments used to determine the underlying implementation and perhaps even some (implementation specific) parameters to pass on to the underlying implementation.
Not following the above paragraph too well.
So for Merlin an InitialKernelFactory implements the KernelFactory
interface. Users create an InitialKernelFactory (who's constructor
determines the factory implementation to use based on arguments). The
InitialKernelFactory requests an implementation's ClassLoader from the
repository.
Just to confirm:
1. we use the repository to establish an api and spi loader
2. we locate (via some parameter) the name of the initial factory interface
3. we locate (via some parameter) the name of the initial factory implementation
4. we perhaps do dome manipulation of parameters at this point
5. we invoke a creation request on the inital factory implementation
6. the inital factory implementation uses the repository to construct the impl classloader and load the operational factory together with the implementation specific parameters
7. creation method returns initial object
Does that sound right?
And the reason that the factory is creating the impl classloader is because the contents of the impl classloader may be a function of the parameters that we provide to the initial factory.
This request consists of determining the implementation to use
which then results in the creation of an instance of ArtifactDescriptor
which is part of the Repository API. The call to the repository to get the
ClassLoader takes this descriptor as an argument. The repository does its
magic: it queries for implementation dependencies, pulls down dependency and
implementation artifacts and builds the implementation ClassLoader (with SPI
and API parent ClassLoaders in a chain) based on cached jar files. The
InitialKernelFactory then uses this ClassLoader to instantiate the
implementation's concrete Factory which it should know how to do using
reflection. In Merlin's case this might be the MerlinKernelFactory. The
InitialKernelFactory then delegates calls made on the KernelFactory
interface methods to the implementation factory delegate
(MerlinKernelFactory instance) that was instantiated within the context of
the Repository assembled ClassLoader. Now users of the kernel embedding API
use the InitialKernelFactory as the pass through to tunnel into the
implementation ClassLoader and make calls against the MerlinKernelFactory.
All factory products returned like Kernel objects for example are based on
the implementation chosen. In the case of Merlin it would be the
MerlinKernel.
I think I've got it.
If we back off for a moment and look at the big picture we may have a generalized Avalon Kernel embedding API that can be used for all Kernel implementations: Merlin, Pheonix et cetera. This all comes down to agreeing upon a common Kernel interface and KernelFactory interface and putting these interfaces into the framework API. If not we certainly have a generalized repository aware application bootstrapping API and that's very valuable in itself.
I think its too early for a framework level defintion of a kernel - but I do think the repository stuff is heading in the right direction to be general container-side facility along-side framework and meta.
This is a lot of babble for one email. I will try to break this down into understandable short and easy chunks. I know I have not done a good job in describing it. I will also try to have some diagrams showing the use cases and object interactions sequence charts. Once the documentation is complete the implementation will become very apparent. I already began an implementation but stopped myself to document it and get a consensus.
Yeah if you're thinking this effort needs to be kept in a separate development branch, then you are right. I can also develop it within the sandbox.
Lets put it together in sandbox (because Steve is a CVS wimp).
Steve I know I've been taking a while here but I think its worth doing this right the first time. I don't want to come back to it and have to redesign it.
No problems here!
The factory handling side is a little fuzzy but I'm sure that will come together. I'm more concerned about the question concerning dependency criteria resolution. Do we use the POM directory or do we generate an artefact. For the moment I'm leaning towards the generation of the artefact from the POM.
That's a pretty big, big picture. Expect more implementation details and documentation from me soon.
Super!
Cheers, Steve.
Alex
--
Stephen J. McConnell mailto:[EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
