Re: Search for similar terms

Eric Jain Mon, 02 Jun 2003 21:23:18 -0700

> have a look at the FuzzyTermEnum class in Lucene.


The FuzzyTermEnum class is truely useful... if I could get it to be a
bit faster. By faster I mean something in the order of one second for a
half gigabyte index; currently the best I get is five seconds.


What I am trying to accomplish:

- If a query does not yield any results, choose and display out of all
similar terms the one which occurs most often in the index.


What I have tried so far:

- Required first three characters to match exactely, excluded from
similarity search (time reduced from 15s to 5s).
- Increased FUZZY_THRESHOLD to 1.75 (no significant effect on time).
- Only executed termCompare for terms with a higher frequency than the
best matching term seen so far (no effect)


Observations:

- Time seems to be independant of the frequency of a term.


Any further ideas would be greatly appreciated!

Also (dear committers...), it would be great if FuzzyTermEnum could be
subclassed, rather than having to resort to copy paste (the class is
final).


--
Eric Jain


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Search for similar terms

Reply via email to