Re: NGP: Storage model

Stefan Guggisberg Mon, 14 Jan 2008 05:54:14 -0800

hi jukka

very interesting! :) a couple of random questions follow inline...


On Jan 14, 2008 2:31 PM, Jukka Zitting <[EMAIL PROTECTED]> wrote:
> Hi,
>
> With the recent NGP interest I wanted to push some of my latest
> prototype work to the jackrabbit-ngp sandbox. Perhaps the most notable
> (though not very fleshed out) concept is the simplified storage
> mechanism that I plan to try out. Here's a quick summary of how I see
> it working.
>
> The storage model is similar to the DataStore concept in
> jackrabbit-core. All content is stored in separate "records" that are
> basically just immutable blobs identified by their SHA-1 checksums.
>
> All nodes are serialized to a binary representation and stored as
> immutable records in the system. The SHA-1 record checksum is used as
> the internal node identifier instead of an explicitly assigned UUID. A
> parent node contains the names and SHA-1 record checksums of all the
> child nodes.

what about the properties?

>
> As an example, consider a simple content tree with four nodes: the
> root node, "foo", "bar", and "baz". The "bar" node is a child of
> "foo", and "foo" and "baz" are children of the root node. In path
> notation:
>
>     /
>     /foo
>     /foo/bar
>     /baz
>
> The "bar" and "baz" nodes are empty, and could  be represented by an
> empty record, with SHA-1 checksum X. The "foo" node has "bar"
> (checksum X) as a child, so could have a binary representation like
> ["bar"=X], with checksum Y. The root node has "foo" (checksum Y) and
> "baz" (checksum X) as child nodes, and could be represented as
> ["foo=:Y,"baz"=X], with checksum Z. The repository would then contain
> the following three records and some metadata that marks record Z as
> the root node.
>
>     X: []
>     Y: ["bar"=X]
>     Z: ["foo"=:Y,"baz"=X]
>     root => Z
>
> A revision that adds an empty "new" node to "/foo/new", would result
> in "foo" getting a new record ["bar"=X,"new"=X] (checksum P) and the
> root node becoming ["foo"=P,"baz"=X] (checksum Q). The repository
> would then be:
>
>     X: []
>     Y: ["bar"=X]
>     Z: ["foo"=:Y,"baz"=X]
>     P: ["bar"=:X,"new"=X]
>     Q: ["foo"=:P,"baz"=X]
>     root => Q
>
> A session that was opened before this change could still continue
> accessing the repository with record Z as the root node until the
> session is either explicitly or implicitly refreshed to the latest
> state. Once all clients have stopped referring to Z as the root node,
> a garbage collector could reduce the repository to:
>
>     X: []
>     P: ["bar"=:X,"new"=X]
>     Q: ["foo"=:P,"baz"=X]
>     root => Q
>
> The only synchronization point in this scheme would be changing the
> root pointer to a more recent version of the root node. A client that
> wants to persist a new revision, can store all the records included in
> the revision, perform any required consistency checks, and finally
> update the root pointer to the validated new root record. Almost all
> of this can be done in parallel with other clients, only when changing
> the root pointer the client needs to verify that nobody else has
> meanwhile updated the root pointer. If the root pointer has changed,
> the client needs to repeat any merging and validation steps before
> retrying the update. In typical scenarios such write conflicts should
> be relatively rare.
>
> There are some notable implications of such a storage model:
>
> Parent references are not stored anywhere, which means that for each
> accessed node all the ancestor nodes must also be accessed. This is a
> requirement in any case if we want to enforce hierarchical access
> controls or or other policies.

how would you build the path of a node accessed by uuid?

cheers
stefan

>
> Explicit UUIDs are stored as literal jcr:uuid properties and REFERENCE
> properties are just specially typed string properties. Indexing is
> used to speed up getNodeByUUID() lookups, making getNodeByUUID
> essentially equivalent to an XPath query like //[EMAIL PROTECTED]:uuid='...'].
> Referential integrity is handled explicitly on a higher level. Because
> of this hard references and direct UUID access will likely worse than
> in current jackrabbit-core, but to me that's a conscious design
> tradeoff.
>
> To make queries work properly for clients that use any past version of
> the root node, search indexes should be stored as a part of the
> content tree instead of outside it. This way a content update will
> always include the respective index updates. To best reuse our current
> query engine, I would store the index files within a special
> /rep:index node. Lucene's segment file model should work well with
> immutable records.
>
> This storage model is quite simple to implement on the file system and
> there's also a trivial mapping to HTTP. In fact any web server that
> supports the GET and PUT methods and the ETag, If-Match, and
> If-None-Match headers should be directly usable as a backend for this
> storage model. Such record resources would also be trivially
> cacheable.
>
> BR,
>
> Jukka Zitting
>

Re: NGP: Storage model

Reply via email to