Hi guys, currently, when we do a search, we try to evaluate the number of candidates each node in the filter expression will return. For instance, such a filter :
(&(cn=test)(sn=acme)) will evaluate the number of entries with a CN = test and the number of entries with a SN = acme. Let say the former has 10 entries and the last has 5 entries, we will annotate each node with this number : (&:[5](cn=test:[10])(sn=acme:[5])) Here, the & will have the lowest number, and at the end, we will select the node whith the lowest number as a base to fetch the entries. So far, so good (I don't ant to go any further about how we evaluate other operators, like | and !, it's way more complex but irrelevant here). Every filter will also have an additional node added : the scope. The scope allows us to limit the number of entries to fetch, depending on where in the tree we are. That can save some time. In our example, let's say we have 1000 entries under ou=system, 100 under ou=users,ou=system and 3 under ou=unitTest,ou=users,ou=system. The filter will be transformed internally to this filter (assuming the scope is SUBTREE) : (&(scope=SUB)(cn=test)(sn=acme)) and annoted as : base search = "ou=system" : (&:[5](scope=SUB[1000](cn=test:[10])(sn=acme:[5])) base search = "ou=users,ou=system" : (&:[5](scope=SUB[100](cn=test:[10])(sn=acme:[5])) base search = "ou=unitTest,ou=users,ou=system" : (&:[3](scope=SUB[3](cn=test:[10])(sn=acme:[5])) As we can see, the scope can change the evaluation. In the last case, we will pick the scope node to fetch the entries, as we will only get 3 of them. Ok, let's go a step further. Those filters were pretty simple to handle, as the underlying database know how many entries will be fetched when handling a filter like (cn=test). Substring, <=, >= filters are way different. We can't know many entries will be >= a key, or which match the substring. For instance, how do we know how many entries will match cn=te* ? We can find the first key matching cn=te in the index, but we will have to walk the index up (or down) to find the entries matching cn=tf*. That means we have acess to the ordering method, to build the key which comes after 'te' (ie, 'tf'), something which is quite difficult to do atm... So... I modified the evaluation for the substring/lesserThen/greaterThan indexed filters, to use a count which is different from the number of element in the index. Right now, the count will always be 10. Why 10 ? Why a fixed count ? - A fixed count is assuming that the user *knows* what he/she is doing when doing a search. Of course, some might do things like (cn=a*), which might return thousands of candidate, but let's say it's not likely. - 10 because it gives a chance that a better index can be selected. I don't really like the '10' value, but it works well. Keep in mind that it's just a lower bound. That does not mean the filter will just return 10 candidate max ! It's just a way for the optimizer to select the substring index instead of defaulting to something else with tens of thousands of candidates. May be 100 would be better... Keep in mind that if there are only 2 candidates that match the filter, we will have the filter selected with 10 as a max count, but we will not fetch 10 candiates, just 2 ! I have modified the server to work this way, if you have better ideas, I'm listening ! -- Regards, Cordialement, Emmanuel Lécharny www.iktek.com
