Building a Jackrabbit roadmap

Jukka Zitting Wed, 14 Feb 2007 09:09:41 -0800

Hi,

There are a lot of things going on around Jackrabbit, and I'd like to
summarize my thoughts on the some of the efforts in an attempt to
start drafting a Jackrabbit roadmap that would present our views of
what will likely be happening within the next few years. This roadmap
could then be used to create more detailed release plans and to
evaluate progress over time.


I've organized this message around various topics, in no particular
order. Feedback on any or all topics is very much welcome! Please feel
free to start a new thread if you'd like to focus on some particular
issue in more detail.

JSR 283
-------

The JCR 2.0 spec being specified by JSR 283 will be a major source of
new features to be added in Jackrabbit within the next year or so. The
current plan is to target Jackrabbit 2.0 as the JSR 283 reference
implementation, which means that we'll need to have at least some sort
of an implementation of all the new features being introduced by the
updated spec.

I'd like to use the JSR 283 public review draft as the baseline
requirements document for Jackrabbit 2.0. Once the draft is out we
should create feature requests for all the major changes and start
working on an iterative plan to get all the features implemented by
the 2.0 release. If possible, we should introduce some of the new
features already during the 1.x release cycle, perhaps by including
any API changes in jackrabbit-api instead of depending on any
jcr-2.0.jar snapshots.

Documentation
-------------

Documentation is definitely one of the key "features" that need major
improvements in Jackrabbit. To raise awareness of the need for
documentation, I'm planning to start including selected documentation
tasks in future release plans.

It would also be nice to come up with ways to better tap the amount of
knowledge transfer that is happening on the Jackrabbit mailing lists
every day. I'd especially like to make it easier to contribute FAQ
updates and other similar documentation improvements. What should we
do to help people volunteer with these issues?

SPI
---

The SPI effort has been ongoing for a while already, and it would be
good to come up with a clear idea of how and when are we going to
integrate it with the rest of Jackrabbit. The SPI model introduces a
major architectural change, and I'm worried about the adverse effects
that change may have if the integration with the rest of Jackrabbit is
not properly managed.

I'd most like to see the SPI being introduced as an evolutionary step
to jackrabbit-core rather than using just parts of the current core to
build a revolutionary new jackrabbit-spi-backend implementation.

The applicability of the SPI as an intemediate layer within a local
repository implementation as opposed to a network remoting layer has
also not yet been discussed in much detail. Before integrating the SPI
with the rest of the project it would be good to review the design
decisions both to verify that we're not missing anything and (even
more importantly) to better educate the development community of the
details of the SPI model.

Finally, I'm worried about the duplication of effort in the
implementation of transient space, name mapping, and other client-side
features in spi-client and jackrabbit-core. Moving at least parts of
such code to jackrabbit-jcr-commons would be a good idea.

Remote access
-------------

The demand for fast remote JCR access to Jackrabbit repositories will
most likely become more pressing as more people are using Jackrabbit
in various environments. There are some improvements we could make in
JCR-RMI to speed up things, but the basic design of JCR-RMI places
some hard performance limits especially for high-latency connections.

The SPI and the WebDAV work related to it seems like a nice solution
to this issue. Unfortunately this solution still needs more work and
since JCR-RMI already scratches some of the itch away, I'm not sure
when we'll see concrete results. Perhaps this would be a good area for
co-operation with other repository projects?

Clustering
----------

The recent clustering work by Dominique is a great addition to
Jackrabbit, and I'm looking forward to seeing more success and problem
reports from people using the new clustering features.

Architecturally thinking the current clustering design is IMHO not
optimal, as it actually makes the normally "dumb" persistence manager
do the hard work by relying on a shared underlying database. I think
we should eventually find a way to manage all the clustering work
above the persistence layer, but just like with remote access there's
no immediate need for improvements as the current solution covers the
basic needs.

Access control
--------------

A number of people have been writing their own custom AccessManagers
to implement access controls in Jackrabbit repositories. This is a
rather heavy investment for many of the basic access control use cases
that could well be met by a more generic AccessManager implementation.
Contributions in this area would probably be widely appreciated.

Performance
-----------

So far performance and scalability issues have come up every now and
then as isolated incidents or discussions, which to me seems good as
it shows that we don't have any glaring performance problems that
every other user would be tripping over. This however doesn't mean
that we couldn't do better, and I'm quite convinced that as the
growing Jackrabbit install base is sooner or later going ot include
people with very strict performance and scalability requirements that
can't easily be solved by just throwing in more hardware.

To better prepare for solving such issues it would be good for us to
have some standard performance benchmarks in place. They wouldn't need
to be very complex, even something that simply populates a large
workspace and retrieves all the stored content would be a good start
as long as the test is repeatable and produces usable reports.

One very general underlying issue that I see as a major performance
bottleneck in the current Jackrabbit design is the reliance on
sequential operation in many critical areas of the codebase. Perhaps
the most glaring issue is the requirement to synchronize
DatabasePersistenceManager.store() even though the content being
stored is almost embarrassingly parallel. In the age of multicore
processors found even on laptop computers we should be looking at all
opportunities to parallelize the code.

Modularity
----------

Even though Jackrabbit supports a number of extension interfaces, the
basic implementation structure is IMHO still rather monolithic with
lots and lots of interdependencies across packages and individual
classes. I'm not too worried about this from a technical perspective,
but it makes it harder for the community to work on the codebase in
parallel. A more modular internal structure would enable easier
experimentation and customization, and allow us to both cover a wider
array of requirements and to accept contributions of alternative
implementations of specific features.

The downside of increased modularity is the added effort to manage all
the separate components as a coherent whole. The OSGi model provides a
compelling solution to this problem, and thus I'd like to investigate
options for using OSGi as a way to enable more modularity within
Jackrabbit.

Configuration
-------------

Just like with the modularity issue, the current Jackrabbit
configuration mechanism provides a reasonable level of flexibility for
specific features but is somewhat limited in other areas. There are
people with valid requirements that can't be met by the current
configuration model. We can do quite a lot by incrementally extending
the existing configuration mechanism, but I don't think this solution
will work in the long run especially if we opt for increased
modularity. An IoC-like configuration mechanism, introduced perhaps
along with OSGi modularization, could remove the configuration
limitations for good.

Administration
--------------

There aren't yet too many administration tools for Jackrabbit. Most
configuration needs to be done manually, we don't have a released
low-level content browser/editor tool, options for node type
modifications are very restricted, a comprehensive backup/restore
toolset is not available, etc. Luckily there are a number of ongoing
efforts that will eventually help with the various parts of this
topic, and so I'm not too worried about this issue. We just need to
spend some effort to make any administration tools that get
implemented available to the Jackrabbit users as parts of the official
releases.

Deployment
----------

The current Jackrabbit project structrure and the available
documentation is mostly focused on using Jackrabbit as an embedded
model 1 repository. I think it would make sense for us to gradually
raise the "default" deployment level first to model 2 by focusing on
making the jackrabbit-webapp project more prominent, and later on to
model 3 by creating a standalone  Jackrabbit installer. The last part
is obviously conditional on major improvements in remote access.

This increase in the deployment level would help make Jackrabbit more
an "infrastructure" product as outlined by the goals of the JCR
specification, and would also help manageability and configurability
by making the Jackrabbit runtime environments more predictable.

BR,

Jukka Zitting

Building a Jackrabbit roadmap

Reply via email to