According to the website's "Query Syntax" page, fuzzy searches are given a boost of 0.2. I've found this not to be the case. Rather, it appears to me (please confirm) that all variations on the term are found in the model, whose dist (dist = levenshteinDistance / length of min(termlength, variantlength)) is greater than 0.5. This then leads to a boolean OR search of all the variant terms, each of whose boost is set to (dist - 0.5)*2 for that variant.
Is that more or less correct? This means that, for example, given a document set with the following search field values: "adagio b" "adagio c" "adagio d" "adagio e" "adagio f" "adagio g" "adagia m" // Note the variation from 'adagio' "quincy b" "quincy c" "quincy d" "quincy e" "quincy f" "quincy g" A search for "adagio" will actually yield "Adagia m" as the number one result, even though it has a greater levenshtein distance from the search term than a number of exact matches. This is due to the term "Adagia m" having a much lower text frequency, I believe. Thus the promotion "Adagia m" gets due to its high Similarity.tf() score more than outweighs the boost of > 0.8 it gets, versus the 1.0 that the exact matches receive, in this example. Proposed solution: If the boost calculated above for *non-exact match* fuzzy terms was multiplied by 0.2, but not for exact matching terms, this problem would be mitigated. Thoughts? While puzzling through this, I threw together a little test app, which creates an index with the above strings in it, and passes in your command line arguments as search terms. You can find it at: http://patrick.bpallen.com/~cormac/levtest.java Usage: java -classpath lucene-1.2.jar:. levtest search-terms (Replace 'search-terms' with your search query). Incidentally, this tool is also useful for confirming the bug (#18014) I just posted, that fuzzy searches are case sensitive. Use the tool to search for 'ADAgio~' and no results come back. Regards, --Cormac Twomey --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]