Hi, On Nov 14, 2007 11:21 AM, (Berry) A.W. van Halderen <[EMAIL PROTECTED]> wrote: > Quering; which would be the layer where the NGP operate?
Optimally I'd put the query indexes into the NGP tree model, for example as special rep:index properties of a node. This way we could have per-subtree indexes and be able to keep accessing an older version (or a branch, like in an uncommitted transaction) of the tree structure at full performance while new content is being written to the head of the repository. Of course, coming up with an index format that is space-efficient enough in an append-only mode is still an open question (though Lucene's segment files do look promising), so I'm not yet sure if the above vision can really be implemented. That's why I'm prototyping. :-) > I was previously in the understanding that the NGP would be the storage > layer which operates below or in the place of the current > SharedItemStateManager. With the remark to implement things like > Node.getNodes() I gather that you want to do away (in time) with the > set of ItemStateManagers. Correct. I think the ItemState model, while flexible and proven, is preventing us to reach a number of performance improvements by focusing on content at a very granular level (e.g. Node.getNodes() is a victim of the classical n*SELECT problem much because of this architecture). It also requires quite complex caching and cache invalidation logic that makes the implementation hard to follow. I also don't like the inherent need for locking and synchronization and the fact that we need to rely on external support for proper clustering. In summary, while I do like and appreciate the current design, I also think that it's starting to show it's age and that we need to look for alternatives to reach new performance and scalability levels. > Apart from this being a total re-write, which would block a lot of progress, This is why I'm working inside a sandbox and want to come up with *very* compelling technical arguments and measured performance improvements before suggesting to bring the code inside jackrabbit-core. Also, I don't yet know whether the road I'm headed down will end up in a dead-end, so for now the effort is strictly limited to prototyping. In any case, even if the NGP model seems successful in practice, I think it'll be realistic to expect us changing the core architecture earliest for something like Jackrabbit 3.0 a few years from now. > I'm also worried that this would tie in the implementation of JCR by > JackRabbit a lot with how things would be stored. I'm not too worried about this. Currently the PersistenceManager model dictates much of the storage model, and in fact I think that changing this model is *the* key to any major improvements. Just like the PersistenceManager model essentially forces the storage layer into a key-value mapping, the NGP model requires an append-only tree hierarchy. My main assumption is that the latter is a more efficient and natural model for JCR content trees. > NGP looking like a sound idea, is not the only method of storage, and I > would rather see the ability of different storage layers with different > characteristics. There's no stopping us having that for NGP as well. Of course the high level architecture dictates the access patterns and the generic content structure, but this doesn't mean that the underlying bit patterns or storage locations need to be the same. BR, Jukka Zitting
