[
https://issues.apache.org/jira/browse/JCR-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved JCR-2760.
--------------------------------
Resolution: Fixed
Fix Version/s: 2.2.0
Assignee: Jukka Zitting
Patch committed in revision 1004568.
The restriction to 24 bits is an unfortunate side-effect of the way the bundle
serialization format chops off the first eight bits from the namespace index of
the name of the primary type of a node. As mentioned by Thomas, this has a
notable effect on the likelihood of collisions, but that seems OK since this
only affects the non-standard use case of copying raw workspace data between
repositories and even there the chance of problems is pretty low for reasonably
sized repositories.
A more complete solution would be to drop the use of namespace and name indexes
in favour of a more efficient serialization format.
> Use hash codes instead of sequence numbers for string indexes
> -------------------------------------------------------------
>
> Key: JCR-2760
> URL: https://issues.apache.org/jira/browse/JCR-2760
> Project: Jackrabbit Content Repository
> Issue Type: Improvement
> Components: jackrabbit-core
> Reporter: Jukka Zitting
> Assignee: Jukka Zitting
> Priority: Minor
> Fix For: 2.2.0
>
> Attachments:
> 0001-JCR-2760-Use-hash-codes-instead-of-sequence-numbers.patch
>
>
> We use index numbers instead of namespace URIs or other strings in many
> places. The two-way mapping between namespace URIs and index numbers is by
> default stored in the repository-global ns_idx.properties file, and the index
> numbers are allocated using a linear sequence. The problem with this approach
> is that two repositories will easily end up with different string index
> mappings, which makes it practically impossible to make low-level copies of
> workspace content across repositories.
> The ultimate solution for this problem would be to store the namespace URIs
> closer to the stored content, ideally as an implementation detail of a
> persistence manager.
> An easier short-term solution would be to decrease the chances of two
> repositories having different string index mappings. A simple (and
> backwards-compatible) way to do this is to use the hash code of a namespace
> URI as the basis of allocating a new index number. Hash collisions are fairly
> unlikely, and can be handled by incrementing the intial hash code until the
> collision is avoided. In the common case of no collisions (with a uniform
> hash function the chance of a collision is less than 1% even with tousands of
> registered namespaces) this solution allows workspaces to be copied between
> repositories without worrying about the namespace index mappings.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.