--On Monday, June 05, 2006 5:25 PM -0400 Alex Karasulu <[EMAIL PROTECTED]> wrote:

Quanah Gibson-Mount wrote:



--On Monday, June 05, 2006 2:54 PM -0400 Alex Karasulu
<[EMAIL PROTECTED]> wrote:

I assume it should also handle approx, which is not the same as
substring.


No as I mentioned before ApacheDS does not do approximate matching and
so it does not have an option to create approx indices.


Right, my point was, I'd assume that should be added, so that it could
be supported....

I don't find the approx matching algorithms based on soundex etc to be
all that useful.  Plus the indices get bloated and the server's write
performance diminishes much faster.  Approx indices must generate all
the varients of a word using these algorithms which can be large.  Every
add, del or modify operation then must regenerate these soundex
derivatives for the old value as well as new values in the modify op.
Keep in mind also some attributes will be multivalued so the explosion
can be quit large.

IMO approx match is one of those things that was a good idea but is not
critical or used all that much.  If we find the time or if you're
interested you can implement this feature.  For now no index is created
for approx matching.


I think the concept of applying all indexing to attributes is in itself broken. As someone who has been running Stanford's directory service for 7 years, we have reasons as to why we index particular attributes the way we do. It is in part sometimes to limit the feasability of doing some searches (leaving substr off of some attributes, for example).

In addition, soundex is quite useful for white page lookups, when someone knows a last name by sound, but not spelling.


In any case, the choice is obviously yours, but I think the thinking so far is flawed.


--Quanah


--
Quanah Gibson-Mount
Principal Software Developer
ITS/Shared Application Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Reply via email to