Andy, are Literals in Statements also hashed with the same MD5 in TDB?

On Tue, Nov 26, 2013 at 3:38 PM, Andy Seaborne <[email protected]> wrote:
> On 26/11/13 19:47, Yi Liao wrote:
>>
>> Hi,
>
>
> Hi there,
>
>
>>
>> Can anybody explain to me how does Jena map node to nodeId? The
>> following is stated in
>> http://jena.apache.org/documentation/tdb/architecture.html
>>
>>
>> "The Node to NodeId mapping is based on hash of the Node (a 128 bit
>> MD5 hash - the length was found not to major performance factor).
>>
>> The default storage of the node table is a sequential access file for
>> the NodeId to Node mapping and a B+Tree for the Node to NodeId
>> mapping."
>>
>> My understanding is that Jena hashes the node into a long integer,
>
>
> Node ->(by calculation) 128 bit value ->(by index) file offset
>
>
>> and somehow converts the hashed value into an address offset to the
>> node table, and the node information is stored at the address offset
>> in the node table.
>
>
> There is a hash to offset index.
>
> The NodeTable itself is heavily cached.
>
>>
>> Is my understanding correct?
>
>
> Yes!
>
>
>> How does Jena converts the hashed value
>> into an address offset? How is B+ tree used in this process?
>
>
> TDB uses a B+tree for the hash to address offset.  While it only needed to
> be a pure key->value mapping, the B+Tree code is used as it's heavily
> tested.
>
> There is in the codebase an external hash table which is pure key->value.
> Using it did not make an observable difference (see teh cache) so using the
> B+Tree code was easy and it doesn't have the reallocate burstiness of the
> external hash table.
>
>>
>> Thanks! Yi Liao
>>
>
>         Andy
>
>



-- 


---
Marco Neumann
KONA

Reply via email to