Re: RDN index, oneLevel and sublevel index merge

Stefan Seelmann Tue, 18 Oct 2011 15:01:45 -0700

On Sun, Oct 9, 2011 at 7:00 PM, Emmanuel Lecharny <elecha...@apache.org> wrote:
>
>
> On Sat, Oct 8, 2011 at 2:52 PM, Stefan Seelmann <seelm...@apache.org> wrote:
>>
>> Hi Emmanuel,
>>
>> On Sat, Oct 8, 2011 at 8:27 AM, Emmanuel Lecharny <elecha...@gmail.com>
>> wrote:
>> > Hi guys,
>> >
>> > Stefan started to modify the code to get rid of the oneLevel and
>> > subLevel
>> > index, which are more or less useless as we already have the hierarchy
>> > stored into the rdn index.
>> >
>> > However, this rdn index is not good enough as is to be use as a
>> > replacement
>> > for the two other indexes. Its structure forbid us to easily retrieve
>> > the
>> > children from a known entry.
>> >
>> > The current RDN index structure is :
>> > <parentId, RDN> -> Entry ID
>> >
>> > The key is a tuple containing the parent ID to be able to rebuild the
>> > DN.
>> >
>> > The reverse index is :
>> > Entry ID -> <parentId, RDN>
>> >
>> > We don't have duplicated values.
>> >
>> > Now, when we have an entry ID, there is no simple way to get the list of
>> > all
>> > the children for this entry.
>> >
>> > We will have to add a third index to deal with such searches :
>> > ParentId -> <entryId, ....>
>> >
>> > which will list all the children of a specific entry.
>> >
>> > I'm going to investigate around this idea i the next few days.
>> >
>> > Thoughts ?
>>
>> Do we really need the additional index? My idea was to avoid this
>> additional index. Therefor I introduced a new cursor, the
>> RdnIndexTreeCursor [1] together with a RdnIndexHelper [2].
>>
>> Let's take the following example DIT:
>>
>> dc=example,dc=com (1)
>> * ou=people (3)
>> ** uid=elecharny (4)
>> ** uid=seelmann (6)
>> * ou=groups (2)
>> ** ou=directory (5)
>> ** ou=mina (7)
>>
>> The corresponding RDN index looks like this:
>>
>> <parentId, RDN> -> <entryId>
>> 0,dc=example,dc=com -> 1
>> 1,ou=groups -> 2
>> 1,ou=people -> 3
>> 2,ou=directory -> 5
>> 2,ou=mina -> 7
>> 3,uid=elecharny -> 4
>> 3,uid=seelmann -> 6
>>
>> One-level traversal works as follows: the RdnIndexTreeCursor is
>> initialzed with the parentId (say 2 for ou=groups). To be able to
>> limit the cursor's range a start element and a stop element are
>> created. Both are ParentIdAndRdn objects with the same parentId (in
>> our example 2). The start element's RDN is null. The stop elements's
>> RDN is a special RDN zzz=zzz. Those ements are used to position the
>> cursor with first()/beforeFirst()/last()/afterLast(). The next() and
>> previous() methods check that the range isn't exceeded.
>
>
> Do you mean that thos elements are created and injected in the RDN index for
> each entry having children ?


No, they are just used just used to position the cursor.

>> Sub-Level traversal is a bit more complex but still straight-forward.
>> As the search base is included in the search results the base
>> ParentIdAndRdn object is used as start and stop element (say
>> <1,ou=groups>). Depth-first traversal is used and a stack of cursors
>> is maintained. When advancing the cursor it checks if the current
>> entry has children, if that's the case a sub-cursor with a limited
>> range (like the one-level) is opened and put to the stack.
>>
>> The RdnIndexTreeCursor is used by OneLevelScopeCursor and SubScopeCursor.
>>
>> The OneLevelScopeEvaluator and SubScopeEvaluator use the
>> RdnIndexHelper, especially the methods isDirectDescendantOf(ID
>> ancestorId, ID descendantId) and isDescendantOf(ID ancestorId, ID
>> descendantId). They just do reverse lookups from the descendantId,
>> till the ancestor is found. It is possible to cache those lookups for
>> branch entries so that only one lookup is required.
>>
>> Finally the DefaultOptimizer uses the RdnIndexHelper to determine the
>> sub-level count. Therefor the ParentIdAndRdn now contains two
>> additional fields oneLevelCount and subLevelCount that are
>> incremented/decremented when adding/removing/moving entries to/from
>> the partition.
>>
>> Let's talk about the downside: This solution should work for flat
>> trees, but not for very deep trees. For very deep trees the
>> RdnIndexTreeCursor needs to maintain a large number of opened cursors.
>
>
> we can close the cursors when they are not used anyöore. Doing a width first
> search, we will only create as many cursor as we need for one single level.
> This should not be overkilling.
>
>>
>> Also the caching of the isDescendantOf information would require more
>> memory for deep trees.
>>
>> Do you think that is a feasible approach?
>
>
> it should work...
>>
>> I try to update the branch tomorrow, but can't promise anything.
>
>
> ok, many thanks ! I tested the branch, there are some failing tests in
> core-integ, but had little time to investigate.

Yes, some tests are still failing. For example I observed some kind of
deadlock when recursively deleting entries in a deep nested tree.

But unfortunately I didn't manage to continue with the branch. I tried
to merge the recent changes from trunk but come conflicts occured. I
think it doesn't make sense to continue for me because due the the
high trunk activity it takes too much time to merge in the trunk
changes.

Kind Regards,
Stefan

Re: RDN index, oneLevel and sublevel index merge

Reply via email to