Hi Emmanuel, As we've seen together, that's not an easy subject and there's no real solution, just relative trade-offs.
I really believe that a good discussion on the ML is necessary in order to be sure about what type of operation is to be preferred in those kind of situations. A community-wide consensus is really important and will help a lot simplifying the decisions in the future. Regards, Pierre-Arnaud On 20 juil. 2010, at 01:34, Emmanuel Lecharny wrote: > Hi guys, > > just an update about the current situation. > > We discussed on this ML about options regarding how to handle subtrees in the > most efficient way in the server. We currently have 3 possibilities : > > - the first one, which is partially implemented in trunk, consist on static > references to subentries in all the entries selected by the > subtreeSpecifications defined in each subentry. > > - the second one suppose that we compute when needed if an entry is a > selected entry, each time we access this entry > > - the third solution, which has not been discussed already, would be to have > an index which stores the list of all the selected entries associated with > each subentry. > > Let's first think about the third solution (this is the one implemented by > OpenDS, btw). It's a costly one when you create a new subentry, or when you > move one, as you have to update the index with all the impacted entries. Plus > it also has a cost as you have to hit the index to get the list of subentries > an entry is part of. Of course, we can use a cache, but I'm not sure that it > will save a lot of time. In fact, AFAICT, you pay the price twice : when > creating the subentry, and each time you access any entry. > > Ok, so option 3 does not seems to be reasonable. To be frank, we discussed > about t with Pierre-Arnaud but not really considered this as a decent > solution. We may be wrong though. > > Regarding option 1 and 2, none is perfect. > > Let's see the pros and cons for each of them > > Solution 1 (update when manipulating the subentry) > -------------------------------------------------- > pros : > - *very* efficient operations on normal entries, as they already have a > reference to all the subentries that select them. > - no extra cost when doing a rename, a modify, an add or a delete of a normal > entry. > - all the subentries are stored in a cache loaded at startup, no need to > fetch an index to get a subentry > - evaluating if an entry is selected is done when the subentry is added or > moved > - move operations applied on normal entries are rare > - any operation applied on a subentry are considered as admin operation, thus > unfrequent and scheduled. Downtime is under control. > > cons : > - *very* costly add/delete/move/rename operations applied on a subentry, as > every selected entries will have to be modified > - *very* costly move operation of a normal entry if it moves from one AP to > another one, and when it has many children, as we have to update all the > children twice : first to reove the old subentries' reference, the second one > to add the new references > - Adding or removing a subentry is a non atomic operation, as we have first > to update the selected entries one by one, and then to add/delete the subentry > > Solution 2 (evaluating when needed ) > ------------------------------------ > pros : > - adding/removing/moving/renaming a subentry is a O(1) operation > - moving normal entries selected by some subentries does not cost more than a > standard move > - easy to implement > > cons : > - each time we access an entry (search, compare), we will have to evaluate > the subtreeSpecifciation for all the subentries this entry is dependent on > - we will need another cache to lookup for the entry's related subentries > (this cache will work pretty much like the one we use for the NamingContexts) > > > Basically, sol.1 and sol.2 are both painful, but there is no perfect solution > anyway. > > I thought about what is the best implementation all day long, and if > yesterday I was convinced that the solution 2 was better. But the evaluation > cost done on each entry every time a client does a search is quite high > (let's say up to 20% of the search cost, for simple SS). That will not slow > down the server that much, as most of the time is consummed in the network > layer right now, but this won't last. > > Now I think that the current solution is may be a not so bad compromise, > assuming we don't do a lot of move operation cross APs on normal entries. > > What bugs me here is that last year, for the opposite reason, the DN was > removed from the entries stored in the master table, just t be able to do > O(1) move operations. This is quite paradoxal ! I still think that move > operations are very rare, and that storing the DN into the entries is a net > gain for most of the operations, except for a move... > > > Some side note : > after having done some perf tests on the evaluator, and applied some > improvement, I can tell that depending on the number of subentries an entry > is depending on, the cost of this evaluation can goes up to 50% of the search > itself cost - not counting the network layer -. For instance, evaluating a > subtreeSpecification with a min and a max, no chop, will be done up to 1 000 > 000 times per second on a 3 level DN (this is all dependent on the DN size) > > Last, not least : the current implementation is really incomplete. The Move, > Rename and MoveAndRename operations are not correctly handled, with many > entries not being updated. I'm going to fix them. I have also created a > branch to play with subtree without breaking the trunk. I'm not sure I will > continue to work on this branch if a decision is made to keep the first > solution. > > > -- > Regards, > Cordialement, > Emmanuel Lécharny > www.iktek.com >
