I would see if you if you can download a dedupe tool or this purpose no
point in letting ColdFusion do it when others have done the hard work :-)








"This e-mail is from Reed Exhibitions (Gateway House, 28 The Quadrant,
Richmond, Surrey, TW9 1DN, United Kingdom), a division of Reed Business,
Registered in England, Number 678540.  It contains information which is
confidential and may also be privileged.  It is for the exclusive use of the
intended recipient(s).  If you are not the intended recipient(s) please note
that any form of distribution, copying or use of this communication or the
information in it is strictly prohibited and may be unlawful.  If you have
received this communication in error please return it to the sender or call
our switchboard on +44 (0) 20 89107910.  The opinions expressed within this
communication are not necessarily those expressed by Reed Exhibitions." 
Visit our website at http://www.reedexpo.com

-----Original Message-----
From: Matthew Reinbold
To: CF-Talk
Sent: Sun Jan 14 16:28:16 2007
Subject: Detecting (Almost) Matches for DeDuping?

I have a dataset that I've put together from a number of client files. To
this point I've been able to easily build a set of ColdFusion tools for
using the data but there is a de-duping process that I need to do that I
just don't now how to approach. 

The data has a series of first and last names. While most of the time I'm
able to detect last name, first name, and date of birth and create a unique
entry in the unified person table. The problem comes when the names are
slightly mis-spelled.

For example I may have:
RIVERA and RIVEERA -or-
MARTINEZ and MARTINE

and because I'm doing exact matching these are appearing as two seperate
entries. I really don't want to eyeball the entire table (thousands of
lines) and manually pick out problem rows. And I don't think I can
completely automate the detection AND correction of dupes. At this point I
just want to run ColdFusion code, have it detect potential dupes, and then
let me take action.

How would I do this? Is a regular expression possible that can detect if two
strings are ALMOST matches? Any help or suggestions would be most
appreciated.



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Create robust enterprise, web RIAs.
Upgrade & integrate Adobe Coldfusion MX7 with Flex 2
http://ad.doubleclick.net/clk;56760587;14748456;a?http://www.adobe.com/products/coldfusion/flex2/?sdid=LVNU

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:266549
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

Reply via email to