thanks again for the info, Erik. I am looking at a great big index; I expect that we will always have . So, the FuzzyQuery is less viable now.
as for what I'm trying to do: We have a site search that uses Lucene in a basic way. It needs to be improved and I'm trying to research all the features of Lucene that we are not using. The reason for looking at FuzzyQuery was because terms in our database get mispelled and get typed in wierd ways by our users. The searches are done through a web interface on a very busy site, so performance is also a big issue. Alot of user generated content is indexed. thanks for your help. Also, Jim Hargrave had a good explaination too: As I understand it the Levenshtein algorithm is applied to each term in the index - then a Boolean query (OR) is formed from all the htis above a certain threshold. The Leven. algorithm is quadratic - so can be slow for larger strings. Smaller strings aren' too bad. However, if you have thousands of terms in your index this can be VERY slow. Especially if the query is redone for every 50 Lucene hits (see previous post on wildcard performace). IMHO, this is a part of Lucene that could be better. Using an "agrep" type algorithm would speed things up. Jim On Tue, 2004-03-02 at 10:43, Erik Hatcher wrote: > On Mar 2, 2004, at 1:23 PM, Supun Edirisinghe wrote: > > now, one more question: what are the big performance hits from using a > > FuzzyQuery. what are some bad cases to use it(eg. many words in the > > phrase? long strings? ) would it be better to read up on the > > Levenshtein > > algorithm or to get into the internals of Lucene and compare what is > > done in FuzzyQuery as opposed to something simpler like PhraseQuery > > To do a FuzzyQuery, *every* term in the field needs to be enumerated > and checked. This may or may not be a problem depending on the size of > your index, but it is definitely worth noting. > > What are you trying to accomplish? > > Erik > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
