Hi Emmanuel, You might want to eval the scope after the filter rather than initially. In most real life deployments, applications are searching for a single user, like (uid=elecharny), as a subtree through the whole directory. The scope becomes irrelevant then. So checking the scope first adds processing of no value.
Just my 2 cents. Ludo -- Ludovic Poitou http://ludopoitou.wordpress.com On Thursday, February 21, 2013 at 19:14 , Emmanuel Lécharny wrote: > Hi guys, > > currently, when we do a search, we try to evaluate the number of > candidates each node in the filter expression will return. For instance, > such a filter : > > (&(cn=test)(sn=acme)) > > will evaluate the number of entries with a CN = test and the number of > entries with a SN = acme. Let say the former has 10 entries and the last > has 5 entries, we will annotate each node with this number : > > > (&:[5](cn=test:[10])(sn=acme:[5])) > > Here, the & will have the lowest number, and at the end, we will select > the node whith the lowest number as a base to fetch the entries. > > So far, so good (I don't ant to go any further about how we evaluate > other operators, like | and !, it's way more complex but irrelevant here). > > Every filter will also have an additional node added : the scope. The > scope allows us to limit the number of entries to fetch, depending on > where in the tree we are. That can save some time. > > In our example, let's say we have 1000 entries under ou=system, 100 > under ou=users,ou=system and 3 under ou=unitTest,ou=users,ou=system. The > filter will be transformed internally to this filter (assuming the scope > is SUBTREE) : > > (&(scope=SUB)(cn=test)(sn=acme)) > > and annoted as : > > > base search = "ou=system" : > (&:[5](scope=SUB[1000](cn=test:[10])(sn=acme:[5])) > > base search = "ou=users,ou=system" : > (&:[5](scope=SUB[100](cn=test:[10])(sn=acme:[5])) > > base search = "ou=unitTest,ou=users,ou=system" : > (&:[3](scope=SUB[3](cn=test:[10])(sn=acme:[5])) > > As we can see, the scope can change the evaluation. In the last case, we > will pick the scope node to fetch the entries, as we will only get 3 of > them. > > Ok, let's go a step further. Those filters were pretty simple to handle, > as the underlying database know how many entries will be fetched when > handling a filter like (cn=test). > > Substring, <=, >= filters are way different. We can't know many entries > will be >= a key, or which match the substring. For instance, how do we > know how many entries will match cn=te* ? We can find the first key > matching cn=te in the index, but we will have to walk the index up (or > down) to find the entries matching cn=tf*. That means we have acess to > the ordering method, to build the key which comes after 'te' (ie, 'tf'), > something which is quite difficult to do atm... > > So... > > I modified the evaluation for the substring/lesserThen/greaterThan > indexed filters, to use a count which is different from the number of > element in the index. Right now, the count will always be 10. Why 10 ? > Why a fixed count ? > > - A fixed count is assuming that the user *knows* what he/she is doing > when doing a search. Of course, some might do things like (cn=a*), which > might return thousands of candidate, but let's say it's not likely. > - 10 because it gives a chance that a better index can be selected. > > I don't really like the '10' value, but it works well. Keep in mind that > it's just a lower bound. That does not mean the filter will just return > 10 candidate max ! It's just a way for the optimizer to select the > substring index instead of defaulting to something else with tens of > thousands of candidates. > May be 100 would be better... > > Keep in mind that if there are only 2 candidates that match the filter, > we will have the filter selected with 10 as a max count, but we will not > fetch 10 candiates, just 2 ! > > I have modified the server to work this way, if you have better ideas, > I'm listening ! > > -- > Regards, > Cordialement, > Emmanuel Lécharny > www.iktek.com (http://www.iktek.com) > >
