I have a dataset that I've put together from a number of client files. To this point I've been able to easily build a set of ColdFusion tools for using the data but there is a de-duping process that I need to do that I just don't now how to approach.
The data has a series of first and last names. While most of the time I'm able to detect last name, first name, and date of birth and create a unique entry in the unified person table. The problem comes when the names are slightly mis-spelled. For example I may have: RIVERA and RIVEERA -or- MARTINEZ and MARTINE and because I'm doing exact matching these are appearing as two seperate entries. I really don't want to eyeball the entire table (thousands of lines) and manually pick out problem rows. And I don't think I can completely automate the detection AND correction of dupes. At this point I just want to run ColdFusion code, have it detect potential dupes, and then let me take action. How would I do this? Is a regular expression possible that can detect if two strings are ALMOST matches? Any help or suggestions would be most appreciated. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Create robust enterprise, web RIAs. Upgrade & integrate Adobe Coldfusion MX7 with Flex 2 http://ad.doubleclick.net/clk;56760587;14748456;a?http://www.adobe.com/products/coldfusion/flex2/?sdid=LVNU Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:266546 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

