Re: Why exactly are fuzzy queries so slow?

markharw00d Sun, 25 Nov 2007 02:54:50 -0800

For "fuzzy" you're going to pay one way or another.

You can use ngram analyzers on indexed content and queries which willadd IO costs ("files" becomes "fi","fil", "file","il","ile","iles" inboth your query and index) or you can use some form of query-time editdistance comparison on "files" and pay the CPU costs. You can useWordNet and "files" becomes "registers". You can examine large volumesof user queries and look at what is the most likely interpretation. Youcan use Soundex and then if you're lucky files==philes but there's noroom for error and they either match or they dont - there is no measureof similarity.


There's no free lunch here.

Timo Nentwig wrote:

On Saturday 24 November 2007 18:28:48 markharw00d wrote:
term. You can limit the number of edit distance comparisons conducted by
setting the minimum prefix length. This is a property of the QueryParser
Well, javadoc: "prefixLength - length of common (non-fuzzy) prefix". So, thisis some kind of "wildcard fuzzy" but not real fuzzy anymore.I understand the optimitation but right now I hardly can image a reasonableuse-case. Who care whether the levenstein distance is a the beginnen, middleor end of word, .e.g when searching fuzzy for "philes" I want tofind "files"...
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Why exactly are fuzzy queries so slow?

Reply via email to