[jira] Resolved: (JCR-2760) Use hash codes instead of sequence numbers for string indexes

Jukka Zitting (JIRA) Tue, 05 Oct 2010 01:40:04 -0700

     [ 
https://issues.apache.org/jira/browse/JCR-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jukka Zitting resolved JCR-2760.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 2.2.0
         Assignee: Jukka Zitting

Patch committed in revision 1004568.

The restriction to 24 bits is an unfortunate side-effect of the way the bundle 
serialization format chops off the first eight bits from the namespace index of 
the name of the primary type of a node. As mentioned by Thomas, this has a 
notable effect on the likelihood of collisions, but that seems OK since this 
only affects the non-standard use case of copying raw workspace data between 
repositories and even there the chance of problems is pretty low for reasonably 
sized repositories.

A more complete solution would be to drop the use of namespace and name indexes 
in favour of a more efficient serialization format.

> Use hash codes instead of sequence numbers for string indexes
> -------------------------------------------------------------
>
>                 Key: JCR-2760
>                 URL: https://issues.apache.org/jira/browse/JCR-2760
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 2.2.0
>
>         Attachments: 
> 0001-JCR-2760-Use-hash-codes-instead-of-sequence-numbers.patch
>
>
> We use index numbers instead of namespace URIs or other strings in many 
> places. The two-way mapping between namespace URIs and index numbers is by 
> default stored in the repository-global ns_idx.properties file, and the index 
> numbers are allocated using a linear sequence. The problem with this approach 
> is that two repositories will easily end up with different string index 
> mappings, which makes it practically impossible to make low-level copies of 
> workspace content across repositories.
> The ultimate solution for this problem would be to store the namespace URIs 
> closer to the stored content, ideally as an implementation detail of a 
> persistence manager.
> An easier short-term solution would be to decrease the chances of two 
> repositories having different string index mappings. A simple (and 
> backwards-compatible) way to do this is to use the hash code of a namespace 
> URI as the basis of allocating a new index number. Hash collisions are fairly 
> unlikely, and can be handled by incrementing the intial hash code until the 
> collision is avoided. In the common case of no collisions (with a uniform 
> hash function the chance of a collision is less than 1% even with tousands of 
> registered namespaces) this solution allows workspaces to be copied between 
> repositories without worrying about the namespace index mappings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (JCR-2760) Use hash codes instead of sequence numbers for string indexes

Reply via email to