Re: NGP: Storage model

Thomas Mueller Mon, 14 Jan 2008 07:16:25 -0800

Hi,

> immutability paths are as stable as direct identifiers when used within the 
> context of a single repository revision


The path is not always stable across revisions, for example when using
same name siblings. That means in the following example deleting
customer[1] would result in 1001 Lucene index updates, because the
path of all invoices would change?

root
|- customer[1]
\- customer[2]
   |- invoice1
   |- invoice2
   |...
   \- invoice1000

> A parent node contains the names and SHA-1 record checksums of all the
> child nodes.

That means flat hierarchies (many child nodes per node) would still be
a problem (like now)?

> Parent references are not stored anywhere, which means that for each
> accessed node all the ancestor nodes must also be accessed.

That means to access a node (referenced using the identifier) would go
like this:

- Run a Lucene query with a given revision id and identifier to get the path
- Read the root node with right revision to get the list of children
- Find the correct child in this list
- Read this node with the right revision to get the list of its children
- And so on until you reach the node

I believe there are a few problems: Lucene queries are relatively
slow, and for flat hierarchies you need to read a lot of data. In
addition to the Lucene query, you need one persistent storage access
per path element.

I'm afraid this architecture would be a lot slower than the current
Jackrabbit: In the current architecture, there is only one persistent
storage access (and no Lucene query) required to read one node
(bundle) using the unique UUID.

Regards,
Thomas

Re: NGP: Storage model

Reply via email to