Sebastian Schaffert created MARMOTTA-469:
--------------------------------------------
Summary: KiWi: Hashing-Based ID Generation
Key: MARMOTTA-469
URL: https://issues.apache.org/jira/browse/MARMOTTA-469
Project: Marmotta
Issue Type: Improvement
Components: KiWi Triple Store
Reporter: Sebastian Schaffert
Assignee: Sebastian Schaffert
Fix For: 3.3
The KiWi triple store currently generates unique IDs for nodes and triples
using a kind of sequence generator. Snowflake is generally very fast, but to
ensure that the same object always gets the same ID a lot of synchronization is
necessary (immediate commit for nodes, triple registry for triples), which has
a considerable performance impact, particularly in clustered environments.
A much faster approach would be to compute the ID from the objects themselves,
e.g. using an efficient and good hashing function. With a 64bit hash, the
probability for conflicts starts getting serious at around 2 billion objects
(probability 10%), so it might make sense switching to 128bit keys as well.
A good overview over clash probabilities is given in:
--
This message was sent by Atlassian JIRA
(v6.2#6252)