Hi, On Jan 14, 2008 6:17 PM, Thomas Mueller <[EMAIL PROTECTED]> wrote: > > For now I'm happy with just a linear list of all the child nodes. > > With the current Jackrabbit: I agree. But I believe we need a solution > for this in the future.
Agreed. > > I'm not too concerned about hitting all parent nodes when accessing a > > node. > > I am concerned if there is no faster way. It's bad if you consider just a single node access, but the extra cost gets amortized quickly as the parent nodes get cached. For example, consider a repository with 1M+ nodes organized in six levels where each node has 10 child nodes. The number of nodes at each level would be: 0: 1 1: 10 2: 100 3: 1,000 4: 10,000 5: 100,000 6: 1,000,000 A traversal of all the leaf nodes would only require loading 1,111,111 nodes (assuming the frequently accessed parent nodes are cached in memory) as opposed to the 1,000,000 nodes that would be strictly necessary. Assuming that access performance is mostly governed by persistent storage accesses, that's only a 11% increase over the best case. Even in a worst case scenario where you access just a single sibling per each group of leaf nodes, the overhead is about 111% not the 600% you'd see when accessing just a single leaf node. In real world cases I'd be surprised if the amortized cost of accessing the parent nodes would be more than 25%. I think that's quite acceptable, especially since then you'll get hierarchical access control almost for free without special "no access control below me" flags or other such workarounds. Also, quite a few of the best practices that have been floating around here emphasize using path-based access over UUID access. In that light think we should actually put most effort in optimizing path-based access. > > As for the performance of the Lucene query. If it really ends up being > > a problem, we can switch to a custom UUID index. > > Sure. This index would need to be updated whenever a value of the node > changes, right? (That wouldn't necessarily be a problem if we mostly > have read access). The UUID index would only need to be updated when referenceable nodes are created, moved, or deleted. Changes to existing nodes would not trigger index updates, as the UUID index would just map the UUID to the path where the node can be found. BR, Jukka Zitting
