[ http://issues.apache.org/jira/browse/DIRSERVER-733?page=all ]

Alex Karasulu closed DIRSERVER-733.
-----------------------------------

    Resolution: Fixed
      Assignee: Alex Karasulu

This has been fixed already.  Just adding it to JIRA for tracking.

> More efficient handling of duplicate keys in indices
> ----------------------------------------------------
>
>                 Key: DIRSERVER-733
>                 URL: http://issues.apache.org/jira/browse/DIRSERVER-733
>             Project: Directory ApacheDS
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: pre-1.0, 1.0-RC1, 1.0-RC2, 1.0-RC3, 1.1.0, 1.0
>            Reporter: Alex Karasulu
>         Assigned To: Alex Karasulu
>             Fix For: 1.1.0, 1.0
>
>
> Here's the description of the problem and the solution that I posted to the 
> mailing list:
> Hi again,
> I've completed my optimization on indices under high partition capacities.  
> The results are in line and side by side to the old results.  For those not 
> interested in the exact optimization that took place skip down to the results.
> The problem:
> ------------
> Up to now duplicate keys (keys with many values) were stored using a TreeSet 
> within the B+Trees of indices.  Indices usually map some attribute value to 
> an ID into the master table for an entry.  The ID is a sequentially unique 
> value assigned to the entry.
> So if 100 people have initials AA then there would be 100 values in the 
> TreeSet for the key 'AA' in the intials index.  This would be serialized and 
> deserialized to and from disk.  At these low numbers there's really very 
> little impact.  However certain indices were seriously impacted like the 
> objectClass index or the hierarchy based system index.  Just imagine having 1 
> million entries of inetOrgPersons.  The objectClass index would then have 1 
> million values in the TreeSet for the key 'inetOrgPerson'.  This would 
> seriously hurt performance from both a space and time perspective.  It would 
> essentially obfuscate the benefits of using BTree's in the first place with 
> large numbers of entries.
> Solution:
> ---------
> What I did was add a configurable threshold parameter when instead of using a 
> TreeSet to store duplicates another B+Tree was created and used within the 
> same index database file.  Right now the default value for this which these 
> tests were conducted with is 1000.  So after having 1000 duplicate values the 
> server switches from using in memory TreeSets to on disk B+Trees for only 
> storing values for that key.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to