[ http://issues.apache.org/jira/browse/DIRSERVER-733?page=all ]
Alex Karasulu closed DIRSERVER-733.
-----------------------------------
Resolution: Fixed
Assignee: Alex Karasulu
This has been fixed already. Just adding it to JIRA for tracking.
> More efficient handling of duplicate keys in indices
> ----------------------------------------------------
>
> Key: DIRSERVER-733
> URL: http://issues.apache.org/jira/browse/DIRSERVER-733
> Project: Directory ApacheDS
> Issue Type: Improvement
> Components: core
> Affects Versions: pre-1.0, 1.0-RC1, 1.0-RC2, 1.0-RC3, 1.1.0, 1.0
> Reporter: Alex Karasulu
> Assigned To: Alex Karasulu
> Fix For: 1.1.0, 1.0
>
>
> Here's the description of the problem and the solution that I posted to the
> mailing list:
> Hi again,
> I've completed my optimization on indices under high partition capacities.
> The results are in line and side by side to the old results. For those not
> interested in the exact optimization that took place skip down to the results.
> The problem:
> ------------
> Up to now duplicate keys (keys with many values) were stored using a TreeSet
> within the B+Trees of indices. Indices usually map some attribute value to
> an ID into the master table for an entry. The ID is a sequentially unique
> value assigned to the entry.
> So if 100 people have initials AA then there would be 100 values in the
> TreeSet for the key 'AA' in the intials index. This would be serialized and
> deserialized to and from disk. At these low numbers there's really very
> little impact. However certain indices were seriously impacted like the
> objectClass index or the hierarchy based system index. Just imagine having 1
> million entries of inetOrgPersons. The objectClass index would then have 1
> million values in the TreeSet for the key 'inetOrgPerson'. This would
> seriously hurt performance from both a space and time perspective. It would
> essentially obfuscate the benefits of using BTree's in the first place with
> large numbers of entries.
> Solution:
> ---------
> What I did was add a configurable threshold parameter when instead of using a
> TreeSet to store duplicates another B+Tree was created and used within the
> same index database file. Right now the default value for this which these
> tests were conducted with is 1000. So after having 1000 duplicate values the
> server switches from using in memory TreeSets to on disk B+Trees for only
> storing values for that key.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira