On Monday, August 4, 2003, at 05:12 PM, Joel Gwynn wrote:


Hey, all. We do lots of (snail) mailings, and we're looking for a fast,
customizable de-duping solution. We're currently taking a look at
doubletake from http://peoplesmith.com/, which is not too expensive, but
I was thinking there might be some perl stuff out there, given perl's
text-processing powers.

There's a wee script I wrote for TPJ a while back that scrapes the U.S. Postal Service's address canonicalizer. The script is on tpj.com; look under Archives for the article called "Five Quick Hacks". The canonicalizer (well, they call it a "zip code locator" or something like that) will transform variants on the same address into the One True Address that the USPS recognizes, so de-duping then becomes a matter of simple string matching.


Won't help you for foreign addresses, obviously.

-Jon

_______________________________________________
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to