[index] reverse index usage for user attributes

Emmanuel Lécharny Thu, 22 Mar 2012 10:20:43 -0700

Hi,

Currently, we create a forward and a reverse index for each indexattribute. The forward index is used when we annotate a filter,searching for the number of candidate it filters. The reverse indexusage is a bit more subtile.

Let's consider a filter like (cn=jo*). As we don't have any clue aboutthe value (it will start with 'jo', but that's all we know), we can'tuse the 'cn' forward index. We can't use the 'cn' reverse index either,at least directly. So the engine will use the scope to determinate alist of candidate : {E1, E2, ... En}. Now, instead of fetching all thoseentries from the master table, what we can do is to use the 'cn' reversetable, as we have a list of entry IDs.

For each entry id in {E1, E2, ... En}, we will check if the 'cn' reversetable contain some value starting with 'jo'.


If th 'cn' index contains the two following table :
forward =
  john --> {E1, E3}
  joe  --> {E2, E3, E5}
  ...

reverse =
  E1 --> john
  E2 --> joe
  E3 --> joe, john
  E5 --> joe
  ...

then using the reverse table, we will find that the E1, E2, E3 and E5entry match, when all the other aren't. No need to fetch the E4, ... Enentries from the master table.

Now, exploiting this rverse table means we read a btree. Which has thesame cost than reading the Master Table (except that we won't have todeserialize the entry).


What if we don't have a reverse table ?

We will have to deserialize the entries, all of them from the {E1, E2,... En} set filtered by the scope.

Is this better then to build a reverse index ? Not necessarily. In thecase where we have more than one node in the filter, for instance(&(cn=jo*)(sn=test)(objectClass=person)), then using the reverse indexmeans we will access as many btrees as we have index attributes in thefilter node. Here, if cn, sn and objectClass are indexed, we willpotentially access 4 btrees (the scope will force us to consider usingthe oneLevel index or the subLevel index).

At the end, it might be more costly, compared to using the entry andmatch it against the nodes in the filter.

When we have many entries already in the cache, thus sparing the cost ofa deserialization, then accessing more than one BTree might be costly,comparing to using the entries themselves.


An alternative
--------------

The pb is that the Node in the filter use a substring : jo*. Our indexare built using the full normalized value. That does not fit.

What if instead of browsing the underlying tree, we use a specificcompare() method instead of an equals() method ? As the tree is ordered,finding the subset of entries taht will be valid candidate is just amatter of finding the part of the tree which match the comparison.

In our example, the compare() method will compare the first caractersfrom the keys to the filter value. Here, the compare method will be theString.startWith(), and if it's not true, we will do a compareTo() toknow if we should go donw th tree on the left or on the right of thecurrent key.


For that, we just need the forward index, the reverse index is useless.

If we use a substring with a '*' at the beginning, then we need anotherindex : a reverted index. Tht means all the values will be storedreverted : john will be stored as 'nhoj', so doing a search on (cn=*hn)will be fast.


Thoughts ?

--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

[index] reverse index usage for user attributes

Reply via email to