Hi, There are a lot of things going on around Jackrabbit, and I'd like to summarize my thoughts on the some of the efforts in an attempt to start drafting a Jackrabbit roadmap that would present our views of what will likely be happening within the next few years. This roadmap could then be used to create more detailed release plans and to evaluate progress over time.
I've organized this message around various topics, in no particular order. Feedback on any or all topics is very much welcome! Please feel free to start a new thread if you'd like to focus on some particular issue in more detail. JSR 283 ------- The JCR 2.0 spec being specified by JSR 283 will be a major source of new features to be added in Jackrabbit within the next year or so. The current plan is to target Jackrabbit 2.0 as the JSR 283 reference implementation, which means that we'll need to have at least some sort of an implementation of all the new features being introduced by the updated spec. I'd like to use the JSR 283 public review draft as the baseline requirements document for Jackrabbit 2.0. Once the draft is out we should create feature requests for all the major changes and start working on an iterative plan to get all the features implemented by the 2.0 release. If possible, we should introduce some of the new features already during the 1.x release cycle, perhaps by including any API changes in jackrabbit-api instead of depending on any jcr-2.0.jar snapshots. Documentation ------------- Documentation is definitely one of the key "features" that need major improvements in Jackrabbit. To raise awareness of the need for documentation, I'm planning to start including selected documentation tasks in future release plans. It would also be nice to come up with ways to better tap the amount of knowledge transfer that is happening on the Jackrabbit mailing lists every day. I'd especially like to make it easier to contribute FAQ updates and other similar documentation improvements. What should we do to help people volunteer with these issues? SPI --- The SPI effort has been ongoing for a while already, and it would be good to come up with a clear idea of how and when are we going to integrate it with the rest of Jackrabbit. The SPI model introduces a major architectural change, and I'm worried about the adverse effects that change may have if the integration with the rest of Jackrabbit is not properly managed. I'd most like to see the SPI being introduced as an evolutionary step to jackrabbit-core rather than using just parts of the current core to build a revolutionary new jackrabbit-spi-backend implementation. The applicability of the SPI as an intemediate layer within a local repository implementation as opposed to a network remoting layer has also not yet been discussed in much detail. Before integrating the SPI with the rest of the project it would be good to review the design decisions both to verify that we're not missing anything and (even more importantly) to better educate the development community of the details of the SPI model. Finally, I'm worried about the duplication of effort in the implementation of transient space, name mapping, and other client-side features in spi-client and jackrabbit-core. Moving at least parts of such code to jackrabbit-jcr-commons would be a good idea. Remote access ------------- The demand for fast remote JCR access to Jackrabbit repositories will most likely become more pressing as more people are using Jackrabbit in various environments. There are some improvements we could make in JCR-RMI to speed up things, but the basic design of JCR-RMI places some hard performance limits especially for high-latency connections. The SPI and the WebDAV work related to it seems like a nice solution to this issue. Unfortunately this solution still needs more work and since JCR-RMI already scratches some of the itch away, I'm not sure when we'll see concrete results. Perhaps this would be a good area for co-operation with other repository projects? Clustering ---------- The recent clustering work by Dominique is a great addition to Jackrabbit, and I'm looking forward to seeing more success and problem reports from people using the new clustering features. Architecturally thinking the current clustering design is IMHO not optimal, as it actually makes the normally "dumb" persistence manager do the hard work by relying on a shared underlying database. I think we should eventually find a way to manage all the clustering work above the persistence layer, but just like with remote access there's no immediate need for improvements as the current solution covers the basic needs. Access control -------------- A number of people have been writing their own custom AccessManagers to implement access controls in Jackrabbit repositories. This is a rather heavy investment for many of the basic access control use cases that could well be met by a more generic AccessManager implementation. Contributions in this area would probably be widely appreciated. Performance ----------- So far performance and scalability issues have come up every now and then as isolated incidents or discussions, which to me seems good as it shows that we don't have any glaring performance problems that every other user would be tripping over. This however doesn't mean that we couldn't do better, and I'm quite convinced that as the growing Jackrabbit install base is sooner or later going ot include people with very strict performance and scalability requirements that can't easily be solved by just throwing in more hardware. To better prepare for solving such issues it would be good for us to have some standard performance benchmarks in place. They wouldn't need to be very complex, even something that simply populates a large workspace and retrieves all the stored content would be a good start as long as the test is repeatable and produces usable reports. One very general underlying issue that I see as a major performance bottleneck in the current Jackrabbit design is the reliance on sequential operation in many critical areas of the codebase. Perhaps the most glaring issue is the requirement to synchronize DatabasePersistenceManager.store() even though the content being stored is almost embarrassingly parallel. In the age of multicore processors found even on laptop computers we should be looking at all opportunities to parallelize the code. Modularity ---------- Even though Jackrabbit supports a number of extension interfaces, the basic implementation structure is IMHO still rather monolithic with lots and lots of interdependencies across packages and individual classes. I'm not too worried about this from a technical perspective, but it makes it harder for the community to work on the codebase in parallel. A more modular internal structure would enable easier experimentation and customization, and allow us to both cover a wider array of requirements and to accept contributions of alternative implementations of specific features. The downside of increased modularity is the added effort to manage all the separate components as a coherent whole. The OSGi model provides a compelling solution to this problem, and thus I'd like to investigate options for using OSGi as a way to enable more modularity within Jackrabbit. Configuration ------------- Just like with the modularity issue, the current Jackrabbit configuration mechanism provides a reasonable level of flexibility for specific features but is somewhat limited in other areas. There are people with valid requirements that can't be met by the current configuration model. We can do quite a lot by incrementally extending the existing configuration mechanism, but I don't think this solution will work in the long run especially if we opt for increased modularity. An IoC-like configuration mechanism, introduced perhaps along with OSGi modularization, could remove the configuration limitations for good. Administration -------------- There aren't yet too many administration tools for Jackrabbit. Most configuration needs to be done manually, we don't have a released low-level content browser/editor tool, options for node type modifications are very restricted, a comprehensive backup/restore toolset is not available, etc. Luckily there are a number of ongoing efforts that will eventually help with the various parts of this topic, and so I'm not too worried about this issue. We just need to spend some effort to make any administration tools that get implemented available to the Jackrabbit users as parts of the official releases. Deployment ---------- The current Jackrabbit project structrure and the available documentation is mostly focused on using Jackrabbit as an embedded model 1 repository. I think it would make sense for us to gradually raise the "default" deployment level first to model 2 by focusing on making the jackrabbit-webapp project more prominent, and later on to model 3 by creating a standalone Jackrabbit installer. The last part is obviously conditional on major improvements in remote access. This increase in the deployment level would help make Jackrabbit more an "infrastructure" product as outlined by the goals of the JCR specification, and would also help manageability and configurability by making the Jackrabbit runtime environments more predictable. BR, Jukka Zitting
