Re: NGP: Storage model

Jukka Zitting Mon, 14 Jan 2008 08:59:28 -0800

Hi,

On Jan 14, 2008 6:17 PM, Thomas Mueller <[EMAIL PROTECTED]> wrote:
> > For now I'm happy with just a linear list of all the child nodes.
>
> With the current Jackrabbit: I agree. But I believe we need a solution
> for this in the future.


Agreed.

> > I'm not too concerned about hitting all parent nodes when accessing a
> > node.
>
> I am concerned if there is no faster way.

It's bad if you consider just a single node access, but the extra cost
gets amortized quickly as the parent nodes get cached. For example,
consider a repository with 1M+ nodes organized in six levels where
each node has 10 child nodes. The number of nodes at each level would
be:

  0: 1
  1: 10
  2: 100
  3: 1,000
  4: 10,000
  5: 100,000
  6: 1,000,000

A traversal of all the leaf nodes would only require loading 1,111,111
nodes (assuming the frequently accessed parent nodes are cached in
memory) as opposed to the 1,000,000 nodes that would be strictly
necessary. Assuming that access performance is mostly governed by
persistent storage accesses, that's only a 11% increase over the best
case.

Even in a worst case scenario where you access just a single sibling
per each group of leaf nodes, the overhead is about 111% not the 600%
you'd see when accessing just a single leaf node.

In real world cases I'd be surprised if the amortized cost of
accessing the parent nodes would be more than 25%. I think that's
quite acceptable, especially since then you'll get hierarchical access
control almost for free without special "no access control below me"
flags or other such workarounds.

Also, quite a few of the best practices that have been floating around
here emphasize using path-based access over UUID access. In that light
think we should actually put most effort in optimizing path-based
access.

> > As for the performance of the Lucene query. If it really ends up being
> > a problem, we can switch to a custom UUID index.
>
> Sure. This index would need to be updated whenever a value of the node
> changes, right? (That wouldn't necessarily be a problem if we mostly
> have read access).

The UUID index would only need to be updated when referenceable nodes
are created, moved, or deleted. Changes to existing nodes would not
trigger index updates, as the UUID index would just map the UUID to
the path where the node can be found.

BR,

Jukka Zitting

Re: NGP: Storage model

Reply via email to