Re: how does the node to nodeId mapping work?

Andy Seaborne Tue, 26 Nov 2013 12:39:32 -0800

On 26/11/13 19:47, Yi Liao wrote:

Hi,


Hi there,


Can anybody explain to me how does Jena map node to nodeId? The
following is stated in
http://jena.apache.org/documentation/tdb/architecture.html


"The Node to NodeId mapping is based on hash of the Node (a 128 bit
MD5 hash - the length was found not to major performance factor).

The default storage of the node table is a sequential access file for
the NodeId to Node mapping and a B+Tree for the Node to NodeId
mapping."

My understanding is that Jena hashes the node into a long integer,


Node ->(by calculation) 128 bit value ->(by index) file offset

and somehow converts the hashed value into an address offset to the
node table, and the node information is stored at the address offset
in the node table.


There is a hash to offset index.

The NodeTable itself is heavily cached.


Is my understanding correct?


Yes!

How does Jena converts the hashed value
into an address offset? How is B+ tree used in this process?

TDB uses a B+tree for the hash to address offset. While it only neededto be a pure key->value mapping, the B+Tree code is used as it's heavilytested.

There is in the codebase an external hash table which is purekey->value. Using it did not make an observable difference (see tehcache) so using the B+Tree code was easy and it doesn't have thereallocate burstiness of the external hash table.


Thanks! Yi Liao

        
        Andy

Re: how does the node to nodeId mapping work?

Reply via email to