Sebastian Schaffert created MARMOTTA-469:
--------------------------------------------

             Summary: KiWi: Hashing-Based ID Generation
                 Key: MARMOTTA-469
                 URL: https://issues.apache.org/jira/browse/MARMOTTA-469
             Project: Marmotta
          Issue Type: Improvement
          Components: KiWi Triple Store
            Reporter: Sebastian Schaffert
            Assignee: Sebastian Schaffert
             Fix For: 3.3


The KiWi triple store currently generates unique IDs for nodes and triples 
using a kind of sequence generator. Snowflake is generally very fast, but to 
ensure that the same object always gets the same ID a lot of synchronization is 
necessary (immediate commit for nodes, triple registry for triples), which has 
a considerable performance impact, particularly in clustered environments.

A much faster approach would be to compute the ID from the objects themselves, 
e.g. using an efficient and good hashing function. With a 64bit hash, the 
probability for conflicts starts getting serious at around 2 billion objects 
(probability 10%), so it might make sense switching to 128bit keys as well.

A good overview over clash probabilities is given in:




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to