Re: Update about subtree problems

Pierre-Arnaud Marcelot Tue, 20 Jul 2010 00:06:02 -0700

Hi Emmanuel,

As we've seen together, that's not an easy subject and there's no real 
solution, just relative trade-offs.


I really believe that a good discussion on the ML is necessary in order to be 
sure about what type of operation is to be preferred in those kind of 
situations.

A community-wide consensus is really important and will help a lot simplifying 
the decisions in the future.

Regards,
Pierre-Arnaud


On 20 juil. 2010, at 01:34, Emmanuel Lecharny wrote:

> Hi guys,
> 
> just an update about the current situation.
> 
> We discussed on this ML about options regarding how to handle subtrees in the 
> most efficient way in the server. We currently have 3 possibilities :
> 
> - the first one, which is partially implemented in trunk, consist on static 
> references to subentries in all the entries selected by the 
> subtreeSpecifications defined in each subentry.
> 
> - the second one suppose that we compute when needed if an entry is a 
> selected entry, each time we access this entry
> 
> - the third solution, which has not been discussed already, would be to have 
> an index which stores the list of all the selected entries associated with 
> each subentry.
> 
> Let's first think about the third solution (this is the one implemented by 
> OpenDS, btw). It's a costly one when you create a new subentry, or when you 
> move one, as you have to update the index with all the impacted entries. Plus 
> it also has a cost as you have to hit the index to get the list of subentries 
> an entry is part of. Of course, we can use a cache, but I'm not sure that it 
> will save a lot of time. In fact, AFAICT, you pay the price twice : when 
> creating the subentry, and each time you access any entry.
> 
> Ok, so option 3 does not seems to be reasonable. To be frank, we discussed 
> about t with Pierre-Arnaud but not really considered this as a decent 
> solution. We may be wrong though.
> 
> Regarding option 1 and 2, none is perfect.
> 
> Let's see the pros and cons for each of them
> 
> Solution 1 (update when manipulating the subentry)
> --------------------------------------------------
> pros :
> - *very* efficient operations on normal entries, as they already have a 
> reference to all the subentries that select them.
> - no extra cost when doing a rename, a modify, an add or a delete of a normal 
> entry.
> - all the subentries are stored in a cache loaded at startup, no need to 
> fetch an index to get a subentry
> - evaluating if an entry is selected is done when the subentry is added or 
> moved
> - move operations applied on normal entries are rare
> - any operation applied on a subentry are considered as admin operation, thus 
> unfrequent and scheduled. Downtime is under control.
> 
> cons :
> - *very* costly add/delete/move/rename operations applied on a subentry, as 
> every selected entries will have to be modified
> - *very* costly move operation of a normal entry if it moves from one AP to 
> another one, and when it has many children, as we have to update all the 
> children twice : first to reove the old subentries' reference, the second one 
> to add the new references
> - Adding or removing a subentry is a non atomic operation, as we have first 
> to update the selected entries one by one, and then to add/delete the subentry
> 
> Solution 2 (evaluating when needed )
> ------------------------------------
> pros :
> - adding/removing/moving/renaming a subentry is a O(1) operation
> - moving normal entries selected by some subentries does not cost more than a 
> standard move
> - easy to implement
> 
> cons :
> - each time we access an entry (search, compare), we will have to evaluate 
> the subtreeSpecifciation for all the subentries this entry is dependent on
> - we will need another cache to lookup for the entry's related subentries 
> (this cache will work pretty much like the one we use for the NamingContexts)
> 
> 
> Basically, sol.1 and sol.2 are both painful, but there is no perfect solution 
> anyway.
> 
> I thought about what is the best implementation all day long, and if 
> yesterday I was convinced that the solution 2 was better. But the evaluation 
> cost done on each entry every time a client does a search is quite high 
> (let's say up to 20% of the search cost, for simple SS). That will not slow 
> down the server that much, as most of the time is consummed in the network 
> layer right now, but this won't last.
> 
> Now I think that the current solution is may be a not so bad compromise, 
> assuming we don't do a lot of move operation cross APs on normal entries.
> 
> What bugs me here is that last year, for the opposite reason, the DN was 
> removed from the entries stored in the master table, just t be able to do 
> O(1) move operations. This is quite paradoxal ! I still think that move 
> operations are very rare, and that storing the DN into the entries is a net 
> gain for most of the operations, except for a move...
> 
> 
> Some side note :
> after having done some perf tests on the evaluator, and applied some 
> improvement, I can tell that depending on the number of subentries an entry 
> is depending on, the cost of this evaluation can goes up to 50% of the search 
> itself cost - not counting the network layer -. For instance, evaluating a 
> subtreeSpecification with a min and a max, no chop, will be done up to 1 000 
> 000 times per second on a 3 level DN (this is all dependent on the DN size)
> 
> Last, not least : the current implementation is really incomplete. The Move, 
> Rename and MoveAndRename operations are not correctly handled, with many 
> entries not being updated. I'm going to fix them. I have also created a 
> branch to play with subtree without breaking the trunk. I'm not sure I will 
> continue to work on this branch if a decision is  made to keep the first 
> solution.
> 
> 
> -- 
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

Re: Update about subtree problems

Reply via email to