Try googling on "matchit" or de dupe software.








"This e-mail is from Reed Exhibitions (Gateway House, 28 The Quadrant,
Richmond, Surrey, TW9 1DN, United Kingdom), a division of Reed Business,
Registered in England, Number 678540.  It contains information which is
confidential and may also be privileged.  It is for the exclusive use of the
intended recipient(s).  If you are not the intended recipient(s) please note
that any form of distribution, copying or use of this communication or the
information in it is strictly prohibited and may be unlawful.  If you have
received this communication in error please return it to the sender or call
our switchboard on +44 (0) 20 89107910.  The opinions expressed within this
communication are not necessarily those expressed by Reed Exhibitions." 
Visit our website at http://www.reedexpo.com

-----Original Message-----
From: Matthew Reinbold
To: CF-Talk
Sent: Sun Jan 14 17:31:51 2007
Subject: Re: Detecting (Almost) Matches for DeDuping?

Thanks for all the quick responses.

SoundEx is interesting but it only finds names that sound the same - like
Johnson and Jonson. However, if a misspelling causes the two names to be
phonetically different - like Johnson and Jihnson I don't believe it will
find that match.

I agree, if there's some available de-duping tool out there I'd use that but
a few minutes spent running google queries only seemed to turn up software
meant to do email list merges. 

Thinking about it further when I eyeball it I'm really looking at three
different fields: the Last Name, the First Name, and Date of Birth. 
 1) If the last name is the same or very similar (starts with the same
character, the rest of the string is only 2 or 3 characters off with the
basic cadence [vowels and consonants in approximately the same order] look
at the first name.
 2) If the first name is an exact match or only 2 or 3 characters off the
basic cadence the look at date of birth (which isn't always available)
 3) If DOB is the same assume the names are the same and flag for merging
(will probably require human intervention to pick which data stays and which
goes). If DOB are different assume they are two different people and leave
the data alone. If DOB is available for only 1, the other, or neither flag
for merging and make a judgement call (if its John Smith and Jon Smith they
very well could be different people; if Desahanti Ouwiboaque and Deshanti
Ouwiboaque assume they are the same). 

That seems like a straight forward process until I get to the point where I
want to flag names that only have 2 or 3 differences with a cadence. I'm not
even sure how to approach that one. Would I really be better off just
forgetting about that code at this point and focus on tools to flag columns
on eyeballing and the merge piece?

(again, thanks for the quick responses)
Matthew



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Create robust enterprise, web RIAs.
Upgrade & integrate Adobe Coldfusion MX7 with Flex 2
http://ad.doubleclick.net/clk;56760587;14748456;a?http://www.adobe.com/products/coldfusion/flex2/?sdid=LVNU

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:266552
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

Reply via email to