Hi Mauro,

I'd go to one of the Lucene mail archives, and search "record linkage", there you will find various conversations on the topic [1]. Also, try googling for that. In particular, you might look for stuff by W. Winkler at the census bureau, amongst others. There is also the Second String package by William Cohen at CMU that may help, but I don't know if it scales or how well supported it is.

Also see http://en.wikipedia.org/wiki/Jaro-Winkler as a starting point. In short, I think Lucene could facilitate such a system, but it probably isn't going to be the main piece.

-Grant

[1] http://lucene.markmail.org/message/nyz7hrmzgzkwporq?q=record+linkage

On Sep 29, 2008, at 9:12 AM, mauro fraboni wrote:

I am studying the possibility to use Lucene in order to build a
matching system for a database of subjects.
The subjects are stored in records of database with different fields
like name, surname, address and I would like to build a proximity
matcher that found an input subject in DB.
The idea is to map the concept of document with the record , fields of
record will be the fields of document.

The problem is that my matching system should be quite accurate and
should be able to return only one subject matched (the most near to
the input) and no subject matched in other cases. I am not able to
find a valid rule for the No-matching. Is it possible to find a rule
based on Score that tells that the subject in input is not near enough
to the subject in DB , so it should not be matched? Is it possible to
find a minimum score for this purpose?
Any suggestion will be appreciated.

ciao mauro


Reply via email to