Re: Substring filter : how to improve their use ?

Ludovic Poitou Fri, 22 Feb 2013 00:00:18 -0800

Hi Emmanuel,

You might want to eval the scope after the filter rather than initially.
In most real life deployments, applications are searching for a single user, 
like (uid=elecharny), as a subtree through the whole directory. The scope 
becomes irrelevant then.
So checking the scope first adds processing of no value.


Just my 2 cents.

Ludo  

--  
Ludovic Poitou
http://ludopoitou.wordpress.com


On Thursday, February 21, 2013 at 19:14 , Emmanuel Lécharny wrote:

> Hi guys,
>  
> currently, when we do a search, we try to evaluate the number of
> candidates each node in the filter expression will return. For instance,
> such a filter :
>  
> (&(cn=test)(sn=acme))
>  
> will evaluate the number of entries with a CN = test and the number of
> entries with a SN = acme. Let say the former has 10 entries and the last
> has 5 entries, we will annotate each node with this number :
>  
>  
> (&:[5](cn=test:[10])(sn=acme:[5]))
>  
> Here, the & will have the lowest number, and at the end, we will select
> the node whith the lowest number as a base to fetch the entries.
>  
> So far, so good (I don't ant to go any further about how we evaluate
> other operators, like | and !, it's way more complex but irrelevant here).
>  
> Every filter will also have an additional node added : the scope. The
> scope allows us to limit the number of entries to fetch, depending on
> where in the tree we are. That can save some time.
>  
> In our example, let's say we have 1000 entries under ou=system, 100
> under ou=users,ou=system and 3 under ou=unitTest,ou=users,ou=system. The
> filter will be transformed internally to this filter (assuming the scope
> is SUBTREE) :
>  
> (&(scope=SUB)(cn=test)(sn=acme))
>  
> and annoted as :
>  
>  
> base search = "ou=system" :
> (&:[5](scope=SUB[1000](cn=test:[10])(sn=acme:[5]))
>  
> base search = "ou=users,ou=system" :
> (&:[5](scope=SUB[100](cn=test:[10])(sn=acme:[5]))
>  
> base search = "ou=unitTest,ou=users,ou=system" :
> (&:[3](scope=SUB[3](cn=test:[10])(sn=acme:[5]))
>  
> As we can see, the scope can change the evaluation. In the last case, we
> will pick the scope node to fetch the entries, as we will only get 3 of
> them.
>  
> Ok, let's go a step further. Those filters were pretty simple to handle,
> as the underlying database know how many entries will be fetched when
> handling a filter like (cn=test).
>  
> Substring, <=, >= filters are way different. We can't know many entries
> will be >= a key, or which match the substring. For instance, how do we
> know how many entries will match cn=te* ? We can find the first key
> matching cn=te in the index, but we will have to walk the index up (or
> down) to find the entries matching cn=tf*. That means we have acess to
> the ordering method, to build the key which comes after 'te' (ie, 'tf'),
> something which is quite difficult to do atm...
>  
> So...
>  
> I modified the evaluation for the substring/lesserThen/greaterThan
> indexed filters, to use a count which is different from the number of
> element in the index. Right now, the count will always be 10. Why 10 ?
> Why a fixed count ?
>  
> - A fixed count is assuming that the user *knows* what he/she is doing
> when doing a search. Of course, some might do things like (cn=a*), which
> might return thousands of candidate, but let's say it's not likely.
> - 10 because it gives a chance that a better index can be selected.
>  
> I don't really like the '10' value, but it works well. Keep in mind that
> it's just a lower bound. That does not mean the filter will just return
> 10 candidate max ! It's just a way for the optimizer to select the
> substring index instead of defaulting to something else with tens of
> thousands of candidates.
> May be 100 would be better...
>  
> Keep in mind that if there are only 2 candidates that match the filter,
> we will have the filter selected with 10 as a max count, but we will not
> fetch 10 candiates, just 2 !
>  
> I have modified the server to work this way, if you have better ideas,
> I'm listening !
>  
> --  
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com (http://www.iktek.com)  
>  
>

Re: Substring filter : how to improve their use ?

Reply via email to