FuzzyQuery uses EditDistance, you probably could create a JaroWinklerQuery that mimics FuzzyQuery but calculates the JaroWinkler score instead of the edit distance. As for dealing with phrases, that would get a bit more complex, but you may be able to use PhraseQuery as an example and then implement a JaroWinklerPhraseQuery or something like that.

-Grant

On Jan 7, 2008, at 8:36 AM, Shivani Sawhney wrote:

Hi All,



I am using Jarowinkler scoring in my current project for matching names. The database of names against which the inputted value has to be matched is huge
and thus we are faced with performance issues.



We now want lucene to help us here; we want lucene's speed for handling huge
data but still want to depend on Jarowinkler for its scoring.



Some sample data that we expect a match between are, such as:

*       'Lowery, Betty' and 'Lowery, Betty Sue'
*       'Malik, Mohammad Saleem'  and 'Malik, Mohammad Salim'



The questions I have are as follows:

*       What, in lucene, should be used for doing a match between such
Strings where a cut-off score is also to be provided while matching them? I tried using fuzzy query in the following way but didn't know how to work
with it when the String to be matched has more than one word.



QueryParser queryParser = new QueryParser(Constants.INDEX_KEY, new
StandardAnalyzer());

       query = queryParser.parse(strSearchString +
Constants.LUCENE_FUZZY_QUERY_SYMBOL + Constants.MATCH_CUT_OFF);



*       Is Lucene scoring to be used here or simply the fuzzy query?
*       Is there any way I can override the way Lucene's logic for String
match and use Jarowinkler there instead.







Any help will be appreciated.



Thanks

Shivani Sawhney




--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to