It's like deja vu all over again. I literally just finished up a
similar task (about 2 hours ago). I didn't use Lucene for it,
although I suppose I could have. Lucene does have the FuzzyQuery
(http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/
javadoc/org/apache/lucene/search/FuzzyQuery.html) that uses
Levenshtein as a place to start.
There are other string matching algorithms as well that are used in
various approaches. See http://en.wikipedia.org/wiki/Edit_distance.
Googling record linkage may help. From there, you can pretty much
knock yourself out with all the different approaches
On Apr 5, 2007, at 3:58 PM, moraleslos wrote:
I was wondering if anyone has done people name matching using
Lucene. For
example, I have a name coming from some external source that I
would like to
match with the one I have in my DB. Lets say my DB contains the
name "John
Smith". If the external source has something like "Smith John",
"Smith,
John", "J. Smith", etc., I would like to rate this matching based
on some %
of closeness for review later. I've searched around a bit for
algorithms
and I kept seeing the Levenshtein distance algorithm which I'm sure
Lucene
uses under the hood. So I trying to guage if Lucene is useful for
doing
something specific as this, or are there better algorithms and/or
software
out there that does name matching. Thanks in advance!
-los
--
View this message in context: http://www.nabble.com/Lucene-for-name-
matching-tf3533454.html#a9862342
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]