Hi,

Currently, we create a forward and a reverse index for each index attribute. The forward index is used when we annotate a filter, searching for the number of candidate it filters. The reverse index usage is a bit more subtile.

Let's consider a filter like (cn=jo*). As we don't have any clue about the value (it will start with 'jo', but that's all we know), we can't use the 'cn' forward index. We can't use the 'cn' reverse index either, at least directly. So the engine will use the scope to determinate a list of candidate : {E1, E2, ... En}. Now, instead of fetching all those entries from the master table, what we can do is to use the 'cn' reverse table, as we have a list of entry IDs.

For each entry id in {E1, E2, ... En}, we will check if the 'cn' reverse table contain some value starting with 'jo'.

If th 'cn' index contains the two following table :
forward =
  john --> {E1, E3}
  joe  --> {E2, E3, E5}
  ...

reverse =
  E1 --> john
  E2 --> joe
  E3 --> joe, john
  E5 --> joe
  ...

then using the reverse table, we will find that the E1, E2, E3 and E5 entry match, when all the other aren't. No need to fetch the E4, ... En entries from the master table.

Now, exploiting this rverse table means we read a btree. Which has the same cost than reading the Master Table (except that we won't have to deserialize the entry).

What if we don't have a reverse table ?

We will have to deserialize the entries, all of them from the {E1, E2, ... En} set filtered by the scope.

Is this better then to build a reverse index ? Not necessarily. In the case where we have more than one node in the filter, for instance (&(cn=jo*)(sn=test)(objectClass=person)), then using the reverse index means we will access as many btrees as we have index attributes in the filter node. Here, if cn, sn and objectClass are indexed, we will potentially access 4 btrees (the scope will force us to consider using the oneLevel index or the subLevel index).

At the end, it might be more costly, compared to using the entry and match it against the nodes in the filter.

When we have many entries already in the cache, thus sparing the cost of a deserialization, then accessing more than one BTree might be costly, comparing to using the entries themselves.

An alternative
--------------

The pb is that the Node in the filter use a substring : jo*. Our index are built using the full normalized value. That does not fit.

What if instead of browsing the underlying tree, we use a specific compare() method instead of an equals() method ? As the tree is ordered, finding the subset of entries taht will be valid candidate is just a matter of finding the part of the tree which match the comparison.

In our example, the compare() method will compare the first caracters from the keys to the filter value. Here, the compare method will be the String.startWith(), and if it's not true, we will do a compareTo() to know if we should go donw th tree on the left or on the right of the current key.

For that, we just need the forward index, the reverse index is useless.

If we use a substring with a '*' at the beginning, then we need another index : a reverted index. Tht means all the values will be stored reverted : john will be stored as 'nhoj', so doing a search on (cn=*hn) will be fast.

Thoughts ?

--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Reply via email to