Hi myers,

is this a parsing *problem*? IMHO the problem consists of several aspects:

a) defining all possible formats for address variants;
b) providing alternative parsing rules for the different formats (the
example formats suggest that this would be quite trivial);
c) each of the different format parse rules collects address data and uses
it to fill out a standard address object;
d) the generated address objects are compared, duplicate objects are
identified and processed (removed?).



At 06:49 AM 5/19/00 -0400, you wrote:
>   Have any of the parsing experts spent any time with the  parsing of
>mailing addresses   e.g. recognizing that    234-500 Main Street
>Centertown, Maryland USA 12345   is the same as:   500 Main St. Apartment
>234 Centertown, Maryland 12345 US   and    500 Main St Apt 234 Centertown,
>MD USA 12345   and    Unit 234  500 Main Centertown MD USA 12345   by
>parsing to add standard address format that might be  something like   
>unit type, unit number, street number, street name, city,  state/province,
>country, zip/postal code   There are apparently lots of variants with
>different unit  types acceptable and multilingual versions of most
>everything.   

;- Elan >> [: - )]

Reply via email to