On 26/11/13 19:47, Yi Liao wrote:
Hi,

Hi there,


Can anybody explain to me how does Jena map node to nodeId? The
following is stated in
http://jena.apache.org/documentation/tdb/architecture.html


"The Node to NodeId mapping is based on hash of the Node (a 128 bit
MD5 hash - the length was found not to major performance factor).

The default storage of the node table is a sequential access file for
the NodeId to Node mapping and a B+Tree for the Node to NodeId
mapping."

My understanding is that Jena hashes the node into a long integer,

Node ->(by calculation) 128 bit value ->(by index) file offset

and somehow converts the hashed value into an address offset to the
node table, and the node information is stored at the address offset
in the node table.

There is a hash to offset index.

The NodeTable itself is heavily cached.


Is my understanding correct?

Yes!

How does Jena converts the hashed value
into an address offset? How is B+ tree used in this process?

TDB uses a B+tree for the hash to address offset. While it only needed to be a pure key->value mapping, the B+Tree code is used as it's heavily tested.

There is in the codebase an external hash table which is pure key->value. Using it did not make an observable difference (see teh cache) so using the B+Tree code was easy and it doesn't have the reallocate burstiness of the external hash table.


Thanks! Yi Liao

        
        Andy


Reply via email to