Henrik I¹m not familiar with SDB so can¹t give you a precise answer but I¹ll do my best to address your question, see answer inline:
On 25/03/2014 09:56, "Henrik Alstad" <[email protected]> wrote: >Hi. >I'm looking at the SDB indexes for the Nodes table, >and I got a very naive question. > >There is a hash based Nodes table, >where there is no ID column, and the primary key of the table is the hash. > >What happens if two of the lex-values by some insane coincidence get the >same hash? I assume that you do get an error as you suggest. Regardless of the hash function used there is always a collision probability. SDB uses MD5 which has a probability of approximately 2^20.96 according to http://en.wikipedia.org/wiki/Comparison_of_cryptographic_hash_functions#Cry ptanalysis so approximately 1 in 2 million >Is there a built-in handling for this case, or is the hash-function >somehow >specified to be unique? or is there actually a slight possibility of just >getting sdb-errors ("non-unique primary key" or something like that) when >using that version? AFAIK there is no handling for this situation SDB is not actively supported by the project and is in maintenance mode, none of the active developers are familiar with the code or maintaining it. It has also been shown to scale poorly compared to native triple stores. The project recommends the use of the native TDB triple store instead which is actively maintained and developed. Of course we recognise that there are various reasons why you might want a SQL backed solution - IT infrastructure limitations, project requirements, better failover support etc - but just be aware that you will be somewhat on your own wrt support is you choose to pursue SDB. While the project will try and fix bugs we don¹t necessarily have the expertise to fix anything non-trivial. Rob >-- >Cheers, >Henrik Kjus Alstad
