[jira] Updated: (JCR-2760) Use hash codes instead of sequence numbers for string indexes

Jukka Zitting (JIRA) Tue, 28 Sep 2010 09:15:59 -0700

     [ 
https://issues.apache.org/jira/browse/JCR-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jukka Zitting updated JCR-2760:
-------------------------------

    Attachment: 0001-JCR-2760-Use-hash-codes-instead-of-sequence-numbers.patch

The attached patch implements the proposed solution.

It turned out that the BundleBinding class uses only 24 bits of the index 
number, so this implementation does the same when allocating the index numbers.

The solution is fully backwards compatible with existing repositories (existing 
index numbers are used as-is), and avoids hash collisions by explicitly 
incrementing the index number until no collision occurs.

> Use hash codes instead of sequence numbers for string indexes
> -------------------------------------------------------------
>
>                 Key: JCR-2760
>                 URL: https://issues.apache.org/jira/browse/JCR-2760
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Jukka Zitting
>            Priority: Minor
>         Attachments: 
> 0001-JCR-2760-Use-hash-codes-instead-of-sequence-numbers.patch
>
>
> We use index numbers instead of namespace URIs or other strings in many 
> places. The two-way mapping between namespace URIs and index numbers is by 
> default stored in the repository-global ns_idx.properties file, and the index 
> numbers are allocated using a linear sequence. The problem with this approach 
> is that two repositories will easily end up with different string index 
> mappings, which makes it practically impossible to make low-level copies of 
> workspace content across repositories.
> The ultimate solution for this problem would be to store the namespace URIs 
> closer to the stored content, ideally as an implementation detail of a 
> persistence manager.
> An easier short-term solution would be to decrease the chances of two 
> repositories having different string index mappings. A simple (and 
> backwards-compatible) way to do this is to use the hash code of a namespace 
> URI as the basis of allocating a new index number. Hash collisions are fairly 
> unlikely, and can be handled by incrementing the intial hash code until the 
> collision is avoided. In the common case of no collisions (with a uniform 
> hash function the chance of a collision is less than 1% even with tousands of 
> registered namespaces) this solution allows workspaces to be copied between 
> repositories without worrying about the namespace index mappings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2760) Use hash codes instead of sequence numbers for string indexes

Reply via email to