On Sun, Oct 9, 2011 at 7:00 PM, Emmanuel Lecharny <elecha...@apache.org> wrote: > > > On Sat, Oct 8, 2011 at 2:52 PM, Stefan Seelmann <seelm...@apache.org> wrote: >> >> Hi Emmanuel, >> >> On Sat, Oct 8, 2011 at 8:27 AM, Emmanuel Lecharny <elecha...@gmail.com> >> wrote: >> > Hi guys, >> > >> > Stefan started to modify the code to get rid of the oneLevel and >> > subLevel >> > index, which are more or less useless as we already have the hierarchy >> > stored into the rdn index. >> > >> > However, this rdn index is not good enough as is to be use as a >> > replacement >> > for the two other indexes. Its structure forbid us to easily retrieve >> > the >> > children from a known entry. >> > >> > The current RDN index structure is : >> > <parentId, RDN> -> Entry ID >> > >> > The key is a tuple containing the parent ID to be able to rebuild the >> > DN. >> > >> > The reverse index is : >> > Entry ID -> <parentId, RDN> >> > >> > We don't have duplicated values. >> > >> > Now, when we have an entry ID, there is no simple way to get the list of >> > all >> > the children for this entry. >> > >> > We will have to add a third index to deal with such searches : >> > ParentId -> <entryId, ....> >> > >> > which will list all the children of a specific entry. >> > >> > I'm going to investigate around this idea i the next few days. >> > >> > Thoughts ? >> >> Do we really need the additional index? My idea was to avoid this >> additional index. Therefor I introduced a new cursor, the >> RdnIndexTreeCursor [1] together with a RdnIndexHelper [2]. >> >> Let's take the following example DIT: >> >> dc=example,dc=com (1) >> * ou=people (3) >> ** uid=elecharny (4) >> ** uid=seelmann (6) >> * ou=groups (2) >> ** ou=directory (5) >> ** ou=mina (7) >> >> The corresponding RDN index looks like this: >> >> <parentId, RDN> -> <entryId> >> 0,dc=example,dc=com -> 1 >> 1,ou=groups -> 2 >> 1,ou=people -> 3 >> 2,ou=directory -> 5 >> 2,ou=mina -> 7 >> 3,uid=elecharny -> 4 >> 3,uid=seelmann -> 6 >> >> One-level traversal works as follows: the RdnIndexTreeCursor is >> initialzed with the parentId (say 2 for ou=groups). To be able to >> limit the cursor's range a start element and a stop element are >> created. Both are ParentIdAndRdn objects with the same parentId (in >> our example 2). The start element's RDN is null. The stop elements's >> RDN is a special RDN zzz=zzz. Those ements are used to position the >> cursor with first()/beforeFirst()/last()/afterLast(). The next() and >> previous() methods check that the range isn't exceeded. > > > Do you mean that thos elements are created and injected in the RDN index for > each entry having children ?
No, they are just used just used to position the cursor. >> Sub-Level traversal is a bit more complex but still straight-forward. >> As the search base is included in the search results the base >> ParentIdAndRdn object is used as start and stop element (say >> <1,ou=groups>). Depth-first traversal is used and a stack of cursors >> is maintained. When advancing the cursor it checks if the current >> entry has children, if that's the case a sub-cursor with a limited >> range (like the one-level) is opened and put to the stack. >> >> The RdnIndexTreeCursor is used by OneLevelScopeCursor and SubScopeCursor. >> >> The OneLevelScopeEvaluator and SubScopeEvaluator use the >> RdnIndexHelper, especially the methods isDirectDescendantOf(ID >> ancestorId, ID descendantId) and isDescendantOf(ID ancestorId, ID >> descendantId). They just do reverse lookups from the descendantId, >> till the ancestor is found. It is possible to cache those lookups for >> branch entries so that only one lookup is required. >> >> Finally the DefaultOptimizer uses the RdnIndexHelper to determine the >> sub-level count. Therefor the ParentIdAndRdn now contains two >> additional fields oneLevelCount and subLevelCount that are >> incremented/decremented when adding/removing/moving entries to/from >> the partition. >> >> Let's talk about the downside: This solution should work for flat >> trees, but not for very deep trees. For very deep trees the >> RdnIndexTreeCursor needs to maintain a large number of opened cursors. > > > we can close the cursors when they are not used anyƶore. Doing a width first > search, we will only create as many cursor as we need for one single level. > This should not be overkilling. > >> >> Also the caching of the isDescendantOf information would require more >> memory for deep trees. >> >> Do you think that is a feasible approach? > > > it should work... >> >> I try to update the branch tomorrow, but can't promise anything. > > > ok, many thanks ! I tested the branch, there are some failing tests in > core-integ, but had little time to investigate. Yes, some tests are still failing. For example I observed some kind of deadlock when recursively deleting entries in a deep nested tree. But unfortunately I didn't manage to continue with the branch. I tried to merge the recent changes from trunk but come conflicts occured. I think it doesn't make sense to continue for me because due the the high trunk activity it takes too much time to merge in the trunk changes. Kind Regards, Stefan