> have a look at the FuzzyTermEnum class in Lucene.
The FuzzyTermEnum class is truely useful... if I could get it to be a bit faster. By faster I mean something in the order of one second for a half gigabyte index; currently the best I get is five seconds. What I am trying to accomplish: - If a query does not yield any results, choose and display out of all similar terms the one which occurs most often in the index. What I have tried so far: - Required first three characters to match exactely, excluded from similarity search (time reduced from 15s to 5s). - Increased FUZZY_THRESHOLD to 1.75 (no significant effect on time). - Only executed termCompare for terms with a higher frequency than the best matching term seen so far (no effect) Observations: - Time seems to be independant of the frequency of a term. Any further ideas would be greatly appreciated! Also (dear committers...), it would be great if FuzzyTermEnum could be subclassed, rather than having to resort to copy paste (the class is final). -- Eric Jain --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
