[ 
https://issues.apache.org/jira/browse/MARMOTTA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schaffert updated MARMOTTA-469:
-----------------------------------------
    Fix Version/s:     (was: 3.3.0)

> KiWi: Hashing-Based ID Generation
> ---------------------------------
>
>                 Key: MARMOTTA-469
>                 URL: https://issues.apache.org/jira/browse/MARMOTTA-469
>             Project: Marmotta
>          Issue Type: Improvement
>          Components: KiWi Triple Store
>            Reporter: Sebastian Schaffert
>            Assignee: Sebastian Schaffert
>
> The KiWi triple store currently generates unique IDs for nodes and triples 
> using a kind of sequence generator. Snowflake is generally very fast, but to 
> ensure that the same object always gets the same ID a lot of synchronization 
> is necessary (immediate commit for nodes, triple registry for triples), which 
> has a considerable performance impact, particularly in clustered environments.
> A much faster approach would be to compute the ID from the objects 
> themselves, e.g. using an efficient and good hashing function. With a 64bit 
> hash, the probability for conflicts starts getting serious at around 2 
> billion objects (probability 10%), so it might make sense switching to 128bit 
> keys as well.
> A good overview over clash probabilities is given in:
> http://preshing.com/20110504/hash-collision-probabilities/
> Changes would affect the API for ID generation (IDGenerator) as well as the 
> value factory. In addition, we would need to ignore duplicate IDs for 
> database inserts, e.g. using triggers or merge. Finally, we need to rethink 
> the behaviour of deleted/non-deleted triples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to