Re: About reverse index ...

Emmanuel Lécharny Thu, 13 May 2010 01:45:42 -0700

On 5/13/10 8:08 AM, Alex Karasulu wrote:

Hi Emmanuel,


On Sun, May 9, 2010 at 6:42 PM, Emmanuel Lecharny<[email protected]>wrote:
...

As soon as you consider that a delete operation will need to update all the

index the deleted entry uses, you see that you can benefit from having such
reverse index : you don't have to grab the entry from the backend, as you
already have it's ID, which is all what you need. Thus, a delete operation
is just about doing :
- get the entry ID
- for each index, if we have an<Entry ID>  stored, then grab the associated
values, and for each value, delete them from the forward index.
- delete the entry from the MT

Yep this is one benefit of the reverse index.

At this point, I'm questionning the fact that it's an advantage. We cando almost the same thing using the forward index, assuming we know theattribute we are dealing with, and we also have the value in the filter,so we can check that the entry we are looking at is present in the indexfor this attribuet. Let me explain with some example :


filter : (cn=value), cn is indexed.
you are looking for entry E(x)
The reverse index will contain :
E(x) --> value
the forward index will contain :
vaue --> E(1), E(5), ... E(x)...E(N)

so you can still in a O(log(N)) find the entry in the forward indexwithout fetching the entry first.

There are other uses for the
reverse index. To start let's just list the various index methods leveraging
the reverse index:


     boolean reverse( ID id ) throws Exception;


     boolean reverse( ID id, K attrVal ) throws Exception;


     boolean reverseGreaterOrEq( ID id ) throws Exception;


     boolean reverseGreaterOrEq( ID id, K attrVal ) throws Exception;


     boolean reverseLessOrEq( ID id ) throws Exception;


     boolean reverseLessOrEq( ID id, K attrVal ) throws Exception;

In fact, those methods are not existing anymore (don't ask me why, Ijust grepped all the code). I don't know where you get them from ...

The only place where the reverse index is used atm is when checking thatan entry has a value.


These various reverse lookups are used all over the XDBM search code base
with various expression Evaluators and Cursors.

Well, not anymore :)

  They are used to evaluate
assertion expressions in filters during search. Without reverse indices
entries must be accessed and deserialized from the master table for every
potential candidate before full filter evaluation takes place.

Nope. You can still use the forward index. But even, it might bevaluable to grab the entry from the disk and check directly against thefilter in memory, instead of reading index. My perception is that we cansave a lot of time assuming that the set of returned entries is low, andif you have more than a couple of filter expressions, as reading indexmight lead to disk access.

Now, it all depends on the memory size. If all the index and entries arein memory (optimal solution), then fetching an entry cost nothing, butwe save all the index brwosing by checking the entry against the filterdirectly.

Of course, if we have disk access, then reading entries from the disk iscostly, but so is the index browsing, as you will have to hit the disktoo. Considering that you have N entry candidates, 1/3 of them beingvalid candidate (ie willbe returned to the client), then the otherfilter expressions will be used to eliminate those 2/3 of bad entries.If you have, say, M filter expressions, then you will potentially hitthe disk while reading the (M-1) index, then when the entry fits, readthe entry from the disk too. If the entry does not fit, you just savethe MT access. So for valid entries you have an extra cost, as you readit from disk anyway, plus all the index potential disk acces. Forinvalid entries, you still have the index reads.

Of course, this is all but proven, I'm talking from a theorical POV. I'mjust trying to rationalise the agorithm, in order to have more than oneperspective about the search performance.

By no mean I think that the proposed solution is faster, I just thinkthat it might be a good idea to test it.

Now, an even better option : all thise should be optionnal. If we canfit all the data in memory, then I don't think that we should usereverse indexes. If the databas is big, and if it's proven that havingreverse indexes has benefit, then we should create them.


In all cases, we should not close any door.

This could be
massively expensive and would cause cache over-turn and most searches.  The
performance would degrade considerably and the whole search engine must be
altered to handle things like alias dereferencing differently which relies
heavily on reverse indices.

Certainly we can have improvements in the algorithm; it is by no means
perfect.  However please realize that this search engine design and db
structure has suited our needs for over 8 years with minor tweaks here and
there.  I'm fairly confident that the removal of reverse tables in indices
will be disastrous but in now way would I say we should not give it a try.
  But please experiment in another branch. This is by far one of the most
critical aspects of the LDAP server.

Well, I didn't intend to change *anything* atm. It's far too premature,and I just wanted to push those ideas here for future evaluations.


Those kind of experimentation deserver obviously to be done in a branch.

What I expect at the end of the day is that we find the best possiblealgotithme, assuming that we don't have a fit-all solution. If only wewere able to tune the search algorithme depending on the memory sise,the data size, and the requests we receive, then it would be great !


In any case, it's an interesting discussion!


--
Regards,
Cordialement,
Emmanuel Lécharny
www.nextury.com

Re: About reverse index ...

Reply via email to