NGP: Storage model

Jukka Zitting Mon, 14 Jan 2008 05:32:25 -0800

Hi,

With the recent NGP interest I wanted to push some of my latest
prototype work to the jackrabbit-ngp sandbox. Perhaps the most notable
(though not very fleshed out) concept is the simplified storage
mechanism that I plan to try out. Here's a quick summary of how I see
it working.


The storage model is similar to the DataStore concept in
jackrabbit-core. All content is stored in separate "records" that are
basically just immutable blobs identified by their SHA-1 checksums.

All nodes are serialized to a binary representation and stored as
immutable records in the system. The SHA-1 record checksum is used as
the internal node identifier instead of an explicitly assigned UUID. A
parent node contains the names and SHA-1 record checksums of all the
child nodes.

As an example, consider a simple content tree with four nodes: the
root node, "foo", "bar", and "baz". The "bar" node is a child of
"foo", and "foo" and "baz" are children of the root node. In path
notation:

    /
    /foo
    /foo/bar
    /baz

The "bar" and "baz" nodes are empty, and could  be represented by an
empty record, with SHA-1 checksum X. The "foo" node has "bar"
(checksum X) as a child, so could have a binary representation like
["bar"=X], with checksum Y. The root node has "foo" (checksum Y) and
"baz" (checksum X) as child nodes, and could be represented as
["foo=:Y,"baz"=X], with checksum Z. The repository would then contain
the following three records and some metadata that marks record Z as
the root node.

    X: []
    Y: ["bar"=X]
    Z: ["foo"=:Y,"baz"=X]
    root => Z

A revision that adds an empty "new" node to "/foo/new", would result
in "foo" getting a new record ["bar"=X,"new"=X] (checksum P) and the
root node becoming ["foo"=P,"baz"=X] (checksum Q). The repository
would then be:

    X: []
    Y: ["bar"=X]
    Z: ["foo"=:Y,"baz"=X]
    P: ["bar"=:X,"new"=X]
    Q: ["foo"=:P,"baz"=X]
    root => Q

A session that was opened before this change could still continue
accessing the repository with record Z as the root node until the
session is either explicitly or implicitly refreshed to the latest
state. Once all clients have stopped referring to Z as the root node,
a garbage collector could reduce the repository to:

    X: []
    P: ["bar"=:X,"new"=X]
    Q: ["foo"=:P,"baz"=X]
    root => Q

The only synchronization point in this scheme would be changing the
root pointer to a more recent version of the root node. A client that
wants to persist a new revision, can store all the records included in
the revision, perform any required consistency checks, and finally
update the root pointer to the validated new root record. Almost all
of this can be done in parallel with other clients, only when changing
the root pointer the client needs to verify that nobody else has
meanwhile updated the root pointer. If the root pointer has changed,
the client needs to repeat any merging and validation steps before
retrying the update. In typical scenarios such write conflicts should
be relatively rare.

There are some notable implications of such a storage model:

Parent references are not stored anywhere, which means that for each
accessed node all the ancestor nodes must also be accessed. This is a
requirement in any case if we want to enforce hierarchical access
controls or or other policies.

Explicit UUIDs are stored as literal jcr:uuid properties and REFERENCE
properties are just specially typed string properties. Indexing is
used to speed up getNodeByUUID() lookups, making getNodeByUUID
essentially equivalent to an XPath query like //[EMAIL PROTECTED]:uuid='...'].
Referential integrity is handled explicitly on a higher level. Because
of this hard references and direct UUID access will likely worse than
in current jackrabbit-core, but to me that's a conscious design
tradeoff.

To make queries work properly for clients that use any past version of
the root node, search indexes should be stored as a part of the
content tree instead of outside it. This way a content update will
always include the respective index updates. To best reuse our current
query engine, I would store the index files within a special
/rep:index node. Lucene's segment file model should work well with
immutable records.

This storage model is quite simple to implement on the file system and
there's also a trivial mapping to HTTP. In fact any web server that
supports the GET and PUT methods and the ETag, If-Match, and
If-None-Match headers should be directly usable as a backend for this
storage model. Such record resources would also be trivially
cacheable.

BR,

Jukka Zitting

NGP: Storage model

Reply via email to