[
https://issues.apache.org/jira/browse/MARMOTTA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Schaffert updated MARMOTTA-469:
-----------------------------------------
Fix Version/s: (was: 3.3.0)
> KiWi: Hashing-Based ID Generation
> ---------------------------------
>
> Key: MARMOTTA-469
> URL: https://issues.apache.org/jira/browse/MARMOTTA-469
> Project: Marmotta
> Issue Type: Improvement
> Components: KiWi Triple Store
> Reporter: Sebastian Schaffert
> Assignee: Sebastian Schaffert
>
> The KiWi triple store currently generates unique IDs for nodes and triples
> using a kind of sequence generator. Snowflake is generally very fast, but to
> ensure that the same object always gets the same ID a lot of synchronization
> is necessary (immediate commit for nodes, triple registry for triples), which
> has a considerable performance impact, particularly in clustered environments.
> A much faster approach would be to compute the ID from the objects
> themselves, e.g. using an efficient and good hashing function. With a 64bit
> hash, the probability for conflicts starts getting serious at around 2
> billion objects (probability 10%), so it might make sense switching to 128bit
> keys as well.
> A good overview over clash probabilities is given in:
> http://preshing.com/20110504/hash-collision-probabilities/
> Changes would affect the API for ID generation (IDGenerator) as well as the
> value factory. In addition, we would need to ignore duplicate IDs for
> database inserts, e.g. using triggers or merge. Finally, we need to rethink
> the behaviour of deleted/non-deleted triples.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)