Good day; Let me start off by saying that by just reading this list, I've received a lot of great information that I have been able to put to use.... (Now that I've flattered all you gurus....)
I have code that finds true (exact) duplicates in records: while (<INFILE>) { if (not $seen{$_}) { $seen{$_} = 1; print OUTFILE; } else { # this record is a duplicate } } I've gone so far as to manipulate $_ so as to remove \W globally and make it uppercase (so that Mr Zorkoff and MR ZORKOFF gets flagged as a duplicate). This works well also. What I'd like to do now next, if it's possible, is to catch items like the following: Mr. Tom Zorkoff 123 Elm St. NE Chicago, Illinois Mr. T. Zorkoff 123 Elm Street North East, Chicago, IL To do this, I'm hoping there is a way I can calculate the number of differences between line 1 and line 2 and use a percentage to determine if it should be considered unique or not. I don't understand how (or if) $_ could tell that $seen{$_} is a potential candidate, then go so far as counting the differences between the two. My thought is that I could get a scalar value for a substitution (i.e.: $result = () = $_ =~ /$seen{$_}/g; or something like that??), but I'm afraid that as soon as there is a single difference between the two, $result will be 0 (false). Am I barking up a beanstalk??? I hope this makes sense. Any and all help is greatly appreciated. Thanks, Carl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]