Andy, are Literals in Statements also hashed with the same MD5 in TDB? On Tue, Nov 26, 2013 at 3:38 PM, Andy Seaborne <[email protected]> wrote: > On 26/11/13 19:47, Yi Liao wrote: >> >> Hi, > > > Hi there, > > >> >> Can anybody explain to me how does Jena map node to nodeId? The >> following is stated in >> http://jena.apache.org/documentation/tdb/architecture.html >> >> >> "The Node to NodeId mapping is based on hash of the Node (a 128 bit >> MD5 hash - the length was found not to major performance factor). >> >> The default storage of the node table is a sequential access file for >> the NodeId to Node mapping and a B+Tree for the Node to NodeId >> mapping." >> >> My understanding is that Jena hashes the node into a long integer, > > > Node ->(by calculation) 128 bit value ->(by index) file offset > > >> and somehow converts the hashed value into an address offset to the >> node table, and the node information is stored at the address offset >> in the node table. > > > There is a hash to offset index. > > The NodeTable itself is heavily cached. > >> >> Is my understanding correct? > > > Yes! > > >> How does Jena converts the hashed value >> into an address offset? How is B+ tree used in this process? > > > TDB uses a B+tree for the hash to address offset. While it only needed to > be a pure key->value mapping, the B+Tree code is used as it's heavily > tested. > > There is in the codebase an external hash table which is pure key->value. > Using it did not make an observable difference (see teh cache) so using the > B+Tree code was easy and it doesn't have the reallocate burstiness of the > external hash table. > >> >> Thanks! Yi Liao >> > > Andy > >
-- --- Marco Neumann KONA
