I did some poking around one day for stuff like this and came across
an algorithm called Soundex that helps you know if two names are the
same, even though they might have slightly different spelling.  I just
did a search, and found that Ben Forta wrote a UDF for doing this.
Not sure if it will find /all/ of your duplicates, but it might help.

http://www.cflib.org/udf.cfm?ID=39

On 1/14/07, Matthew Reinbold <[EMAIL PROTECTED]> wrote:
> I have a dataset that I've put together from a number of client files. To 
> this point I've been able to easily build a set of ColdFusion tools for using 
> the data but there is a de-duping process that I need to do that I just don't 
> now how to approach.
>
> The data has a series of first and last names. While most of the time I'm 
> able to detect last name, first name, and date of birth and create a unique 
> entry in the unified person table. The problem comes when the names are 
> slightly mis-spelled.
>
> For example I may have:
> RIVERA and RIVEERA -or-
> MARTINEZ and MARTINE
>
> and because I'm doing exact matching these are appearing as two seperate 
> entries. I really don't want to eyeball the entire table (thousands of lines) 
> and manually pick out problem rows. And I don't think I can completely 
> automate the detection AND correction of dupes. At this point I just want to 
> run ColdFusion code, have it detect potential dupes, and then let me take 
> action.
>
> How would I do this? Is a regular expression possible that can detect if two 
> strings are ALMOST matches? Any help or suggestions would be most appreciated.
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Create robust enterprise, web RIAs.
Upgrade & integrate Adobe Coldfusion MX7 with Flex 2
http://ad.doubleclick.net/clk;56760587;14748456;a?http://www.adobe.com/products/coldfusion/flex2/?sdid=LVNU

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:266547
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

Reply via email to