Re: understanding jackrabbit datastorage

Jukka Zitting Fri, 27 Apr 2007 11:48:38 -0700

Hi,

On 4/27/07, Stefan Kurla <[EMAIL PROTECTED]> wrote:

I guess this is more suited for the dev list.


Yep.

How is the data actually stored in jackrabbit say using mysql for
example and we are just using the default workspace.


A good starting point in understanding the underlying storage model of
Jackrabbit is to look at the PersistenceManager interface [1]. The
actual physical storage model depends on the persistence manager
implementation you are using, but the logical model is fixed by the
interface.

The PersistenceManager abstraction essentially treats all nodes and
properties as individually addressable items that each have their own
unique identifier. In addition to these items the interface also
defines a mechanism to store and access all the references pointing to
a node.

There is the default_binval which has binval_id and binval_data.
### Is this table used to store binary data, where binval_id is the
uuid of the jcr:content that this is referring to and binval_data is
the actual bytestream blob data


Yes, the binval table stores binary properties when the externalBLOBs
configuration option is set to "false".

The binval_id column contains the property identifier plus value index
(because of multivalued properties) used to identify the binary value,
and the binval_data column contains the actual byte stream.

There is default_node which has node_id and node_data.
###How is this used?


The node_id column contains the unique node identifier and the
node_data column contains the node state in a serialized format [2].

default_prop with prop_id and prop_data
###How is this used?


The prop_id column contains the property identifier, and the prop_data
column contains the property state in a serialized format [2].

default_refs with node_id and refs_data
###How is this used?


The node_id contains the identifier of the reference target node, and
the refs_data contains the list of referencing property identifiers in
a serialized format [2].

Say the structure is
/
--folderA:nt:folder (propertyX:references fileB)
----fileA:nt:file
--fileB:nt:file
[...]
My question then is how would the database store the uuids or nodes of
the structure that is defined above. Very simple structure but to
understand how this structure is actually translated to be stored in
the database would be helpful.


You'd have four node rows: the root node, folderA, fileA, and fileB.
The serialized node_data part of the root and folderA nodes would
contain the node identifiers of the child nodes  (folderA and fileB
for the root node, and fileA for folderA).

All properties would be stored in the property table. Additionally the
reference from propertyX to fileB would be stored as a separate refs
row with the fileB UUID as the node_id value and a serialized property
identifier list that contains just the propertyX identifier as the
refs_data value.

I hope this description helps. Note that this only applies to the
traditional database persistence managers. The new bundle persistence
managers in Jackrabbit 1.3 work a bit differently, though the same
identifier->data structure is still in use.

BR,

Jukka Zitting

[1] 
http://jackrabbit.apache.org/api/1.2.1/org/apache/jackrabbit/core/persistence/PersistenceManager.html
[2] 
http://jackrabbit.apache.org/api/1.2.1/org/apache/jackrabbit/core/persistence/util/Serializer.html

Re: understanding jackrabbit datastorage

Reply via email to