FuzzyQuery uses EditDistance, you probably could create a
JaroWinklerQuery that mimics FuzzyQuery but calculates the JaroWinkler
score instead of the edit distance. As for dealing with phrases, that
would get a bit more complex, but you may be able to use PhraseQuery
as an example and then implement a JaroWinklerPhraseQuery or something
like that.
-Grant
On Jan 7, 2008, at 8:36 AM, Shivani Sawhney wrote:
Hi All,
I am using Jarowinkler scoring in my current project for matching
names. The
database of names against which the inputted value has to be matched
is huge
and thus we are faced with performance issues.
We now want lucene to help us here; we want lucene's speed for
handling huge
data but still want to depend on Jarowinkler for its scoring.
Some sample data that we expect a match between are, such as:
* 'Lowery, Betty' and 'Lowery, Betty Sue'
* 'Malik, Mohammad Saleem' and 'Malik, Mohammad Salim'
The questions I have are as follows:
* What, in lucene, should be used for doing a match between such
Strings where a cut-off score is also to be provided while matching
them? I
tried using fuzzy query in the following way but didn't know how to
work
with it when the String to be matched has more than one word.
QueryParser queryParser = new
QueryParser(Constants.INDEX_KEY, new
StandardAnalyzer());
query = queryParser.parse(strSearchString +
Constants.LUCENE_FUZZY_QUERY_SYMBOL + Constants.MATCH_CUT_OFF);
* Is Lucene scoring to be used here or simply the fuzzy query?
* Is there any way I can override the way Lucene's logic for String
match and use Jarowinkler there instead.
Any help will be appreciated.
Thanks
Shivani Sawhney
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]