Re: Repository, Bootstrapping and Embedding

Stephen McConnell Wed, 05 Nov 2003 16:21:53 -0800

Alex Karasulu wrote:

-o- The Big Picture -o-
We have a situation where we need to securely and dynamically load classes
that are pulled down as artifact jars from a remote repository.   We want to
isolate the classes loaded from repository jar artifacts within separate
nested ClassLoaders to clearly model API, SPI and implementation artifact
dependencies thereby removing the potential for class collisions.
The repository is responsible for pulling down jars into a local cache as
they are requested so ClassLoaders can be assembled using those locally
cached jar artifacts.  But when establishing the repository we need to pull
down some artifacts just to start up the repository when one does not exist.
This is where we have a chicken and egg problem.
Let's not loose sight of our ultimate goal: to have a small kernel bootstrapping jar. Take this bootstrap jar, add some water and it should blossom into a nice kernel by going though a set of bootstrapping stages that pull down information and artifacts from the repository.

When the stages of bootstrapping are analyzed we realize that the Repository itself must be used to bootstrap the kernel. The repository must give us the locally cached artifacts used to assemble the Kernel's ClassLoaders. In many respects it makes sense to ask the repository to build a ClassLoader hierarchy on our behalf given any artifact.

I like the idea of moving the ClassLoader construction as a service provided by the repository. It's nice!

This is where the power to query the repository for POM information becomes
critical.  When the repository is asked for a jar artifact it should return
the ClassLoader for that artifact and not a simple yes no answer.  The
ClassLoader regardless of the nesting scheme used should provide for all
runtime jar dependencies associated with original jar artifact requested.


I think that using a POM as the source of information for a
ClassLoader definition is feasible - but I have reservations
concerning directly accessing a POM.  Basically a POM contains
build time and test time dependencies which are not necessarily
the same as runtime dependencies.  Secondly, runtime
dependencies can involve policy decisions that are not
expressed in the POM model as it stands today.

However - we can use a POM to generate classloader construction
criteria. I've been playing around with jelly to automate the
generation of classloader criteria using dependency properties.

For example - the following dependency declarion includes a
property named "avalon.classloader" which contains the name
of the classloader category into which the dependency should
be loaded.

   <dependency>
     <groupId>avalon-util</groupId>
     <artifactId>avalon-util-defaults</artifactId>
     <version>1.0-dev</version>
     <properties>
       <avalon.classloader>impl</avalon.classloader>
     </properties>
   </dependency>

This approach assumes specific property values such as "api",
"spi", "impl". Using this information its possible to generate
the classloader criteria in the form of a flat properties file
which keeps things small in terms of footprint.  This process
could be packaged into a plugin for convenience. However,
there is a disadvantage that we would be duplicating
information that exists within the POM.

What implications does this have on the Repository?  The repository when
given an artifact must determine the chain of dependencies by querying a
POM.  The repository then uses this information to download the required
jars into the local cache in the appropriate structure.  The cached jars are
used to construct the ClassLoader with the appropriate parents for the
requested jar artifact.
The bottom line: the remote repository needs to become more intelligent. For the time being we can mimic this intelligence by laying out some descriptor artifacts within the repository and building in the logic within the client API which effectively mimics a queriable (is that a word?) repository implementation. If we create a good repository SPI then the fat repository implementation can be traded in latter for a thinner one that can talk to the repository in a better language. Yeah I think that's LDAP so what! ;-)

:-)

Let's watch our expression semantics and be absolutely clear. Bootstrapping in the general sense for kernels and any other repository dependent applications occur through the repository. The repository is the general bootstrapping API.

+1

Now bootstrapping the repository is a completely different endeavor all
together however it can benefit from the functionality used to make the
repository serve as the general bootstrapping API.   The difference between
the repository bootstrapping and the general bootstrapping mechanism is the
need for some seed information to get the process going.  This information
can easily be embedded into the repository API or the API jar.

Yep.

For the generic bootstrapping framework to be used by Merlin or any other
repository aware application we need to devise some conventions around its
use.  To think about the conventions we need let's start looking into the
use cases for Merlin.
If Merlin is a repository aware application it resides within the repository and so do its dependent artifacts. The repository stores the project information for Merlin as well as its dependencies. The dependency tree can be determined and a nested ClassLoader structure can be assembled using this information to safely build and run the Merlin kernel. With regard to the way ClassLoaders work it is best to provide a top level factory interface for your application and its embedding API. This way anything using the top level factory automatically creates objects that inherit the factory's ClassLoader. The factory method design pattern can elegantly be used to cross ClassLoaders separating the API from the implementation classes. Using this pattern a Factory interface (repository aware application's embedding API part) would be used to cross into the implementation ClassLoader to make calls against the factory implementation (the repository aware application's embedding implementation part). The implementation factory then creates concrete implementation products within the implementation ClassLoader so the concrete product classes are isolated in the ClassLoader of the implementation factory.

OK so far.

The repository aware
application's embedding API must define a special initial factory
implementation that acts like a delegate to a factory implementation.   This
initial factory should expose a constructor that takes arguments used to
determine the underlying implementation and perhaps even some
(implementation specific) parameters to pass on to the underlying
implementation.

Not following the above paragraph too well.

So for Merlin an InitialKernelFactory implements the KernelFactory interface. Users create an InitialKernelFactory (who's constructor determines the factory implementation to use based on arguments). The InitialKernelFactory requests an implementation's ClassLoader from the repository.

Just to confirm:

1. we use the repository to establish an api and spi loader 2. we locate (via some parameter) the name of the initial factory interface 3. we locate (via some parameter) the name of the initial factory implementation 4. we perhaps do dome manipulation of parameters at this point 5. we invoke a creation request on the inital factory implementation 6. the inital factory implementation uses the repository to construct the impl classloader and load the operational factory together with the implementation specific parameters 7. creation method returns initial object

Does that sound right?

And the reason that the factory is creating the impl classloader is because the contents of the impl classloader may be a function of the parameters that we provide to the initial factory.

This request consists of determining the implementation to use which then results in the creation of an instance of ArtifactDescriptor which is part of the Repository API. The call to the repository to get the ClassLoader takes this descriptor as an argument. The repository does its magic: it queries for implementation dependencies, pulls down dependency and implementation artifacts and builds the implementation ClassLoader (with SPI and API parent ClassLoaders in a chain) based on cached jar files. The InitialKernelFactory then uses this ClassLoader to instantiate the implementation's concrete Factory which it should know how to do using reflection. In Merlin's case this might be the MerlinKernelFactory. The InitialKernelFactory then delegates calls made on the KernelFactory interface methods to the implementation factory delegate (MerlinKernelFactory instance) that was instantiated within the context of the Repository assembled ClassLoader. Now users of the kernel embedding API use the InitialKernelFactory as the pass through to tunnel into the implementation ClassLoader and make calls against the MerlinKernelFactory. All factory products returned like Kernel objects for example are based on the implementation chosen. In the case of Merlin it would be the MerlinKernel.

I think I've got it.

If we back off for a moment and look at the big picture we may have a
generalized Avalon Kernel embedding API that can be used for all Kernel
implementations: Merlin, Pheonix et cetera.  This all comes down to agreeing
upon a common Kernel interface and KernelFactory interface and putting these
interfaces into the framework API.  If not we certainly have a generalized
repository aware application bootstrapping API and that's very valuable in
itself.


I think its too early for a framework level defintion of a
kernel - but I do think the repository stuff is heading in the
right direction to be general container-side facility along-side
framework and meta.


This is a lot of babble for one email.  I will try to break this down into
understandable short and easy chunks.  I know I have not done a good job in
describing it.  I will also try to have some diagrams showing the use cases
and object interactions sequence charts.   Once the documentation is
complete the implementation will become very apparent.  I already began an
implementation but stopped myself to document it and get a consensus.

Yeah if you're thinking this effort needs to be kept in a separate
development branch, then you are right.  I can also develop it within the
sandbox.

Lets put it together in sandbox (because Steve is a CVS wimp).

Steve I know I've been taking a while here but I think its worth doing this
right the first time.  I don't want to come back to it and have to redesign
it.

No problems here!

The factory handling side is a little fuzzy but I'm sure that will come
together.  I'm more concerned about the question concerning dependency
criteria resolution.  Do we use the POM directory or do we generate an
artefact.  For the moment I'm leaning towards the generation of the
artefact from the POM.

That's a pretty big, big picture.  Expect more implementation details and
documentation from me soon.

Super!

Cheers, Steve.

Alex

--

Stephen J. McConnell
mailto:[EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Repository, Bootstrapping and Embedding

Reply via email to