Try googling on "matchit" or de dupe software.
"This e-mail is from Reed Exhibitions (Gateway House, 28 The Quadrant, Richmond, Surrey, TW9 1DN, United Kingdom), a division of Reed Business, Registered in England, Number 678540. It contains information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you have received this communication in error please return it to the sender or call our switchboard on +44 (0) 20 89107910. The opinions expressed within this communication are not necessarily those expressed by Reed Exhibitions." Visit our website at http://www.reedexpo.com -----Original Message----- From: Matthew Reinbold To: CF-Talk Sent: Sun Jan 14 17:31:51 2007 Subject: Re: Detecting (Almost) Matches for DeDuping? Thanks for all the quick responses. SoundEx is interesting but it only finds names that sound the same - like Johnson and Jonson. However, if a misspelling causes the two names to be phonetically different - like Johnson and Jihnson I don't believe it will find that match. I agree, if there's some available de-duping tool out there I'd use that but a few minutes spent running google queries only seemed to turn up software meant to do email list merges. Thinking about it further when I eyeball it I'm really looking at three different fields: the Last Name, the First Name, and Date of Birth. 1) If the last name is the same or very similar (starts with the same character, the rest of the string is only 2 or 3 characters off with the basic cadence [vowels and consonants in approximately the same order] look at the first name. 2) If the first name is an exact match or only 2 or 3 characters off the basic cadence the look at date of birth (which isn't always available) 3) If DOB is the same assume the names are the same and flag for merging (will probably require human intervention to pick which data stays and which goes). If DOB are different assume they are two different people and leave the data alone. If DOB is available for only 1, the other, or neither flag for merging and make a judgement call (if its John Smith and Jon Smith they very well could be different people; if Desahanti Ouwiboaque and Deshanti Ouwiboaque assume they are the same). That seems like a straight forward process until I get to the point where I want to flag names that only have 2 or 3 differences with a cadence. I'm not even sure how to approach that one. Would I really be better off just forgetting about that code at this point and focus on tools to flag columns on eyeballing and the merge piece? (again, thanks for the quick responses) Matthew ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Create robust enterprise, web RIAs. Upgrade & integrate Adobe Coldfusion MX7 with Flex 2 http://ad.doubleclick.net/clk;56760587;14748456;a?http://www.adobe.com/products/coldfusion/flex2/?sdid=LVNU Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:266552 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

