Hi, Shivani, For my understanding, Jarowinkler doesn't quite fit with Lucene's structure. Calculating Jaro-Winkler distance for the query against each word in the index is quite computational intensive.
What's possible may be using SoundEx, Metaphone, Double Metaphone, etc, instead. For each word in the index, it can pre-calculate a value for the query to match against.(which is how Analyzer works). -- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer (remain anonymous per request) got 2.6 Million Euro funding! On Jan 7, 2008 5:36 AM, Shivani Sawhney <[EMAIL PROTECTED]> wrote: > Hi All, > > > > I am using Jarowinkler scoring in my current project for matching names. The > database of names against which the inputted value has to be matched is huge > and thus we are faced with performance issues. > > > > We now want lucene to help us here; we want lucene's speed for handling huge > data but still want to depend on Jarowinkler for its scoring. > > > > Some sample data that we expect a match between are, such as: > > * 'Lowery, Betty' and 'Lowery, Betty Sue' > * 'Malik, Mohammad Saleem' and 'Malik, Mohammad Salim' > > > > The questions I have are as follows: > > * What, in lucene, should be used for doing a match between such > Strings where a cut-off score is also to be provided while matching them? I > tried using fuzzy query in the following way but didn't know how to work > with it when the String to be matched has more than one word. > > > > QueryParser queryParser = new QueryParser(Constants.INDEX_KEY, new > StandardAnalyzer()); > > query = queryParser.parse(strSearchString + > Constants.LUCENE_FUZZY_QUERY_SYMBOL + Constants.MATCH_CUT_OFF); > > > > * Is Lucene scoring to be used here or simply the fuzzy query? > * Is there any way I can override the way Lucene's logic for String > match and use Jarowinkler there instead. > > > > > > > > Any help will be appreciated. > > > > Thanks > > Shivani Sawhney > > > > -- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer (remain anonymous per request) got 2.6 Million Euro funding! --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]