It's like deja vu all over again. I literally just finished up a similar task (about 2 hours ago). I didn't use Lucene for it, although I suppose I could have. Lucene does have the FuzzyQuery (http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ javadoc/org/apache/lucene/search/FuzzyQuery.html) that uses Levenshtein as a place to start.

There are other string matching algorithms as well that are used in various approaches. See http://en.wikipedia.org/wiki/Edit_distance. Googling record linkage may help. From there, you can pretty much knock yourself out with all the different approaches

On Apr 5, 2007, at 3:58 PM, moraleslos wrote:


I was wondering if anyone has done people name matching using Lucene. For example, I have a name coming from some external source that I would like to match with the one I have in my DB. Lets say my DB contains the name "John Smith". If the external source has something like "Smith John", "Smith, John", "J. Smith", etc., I would like to rate this matching based on some % of closeness for review later. I've searched around a bit for algorithms and I kept seeing the Levenshtein distance algorithm which I'm sure Lucene uses under the hood. So I trying to guage if Lucene is useful for doing something specific as this, or are there better algorithms and/or software
out there that does name matching.  Thanks in advance!

-los
--
View this message in context: http://www.nabble.com/Lucene-for-name- matching-tf3533454.html#a9862342
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to