Hi myers,
is this a parsing *problem*? IMHO the problem consists of several aspects:
a) defining all possible formats for address variants;
b) providing alternative parsing rules for the different formats (the
example formats suggest that this would be quite trivial);
c) each of the different format parse rules collects address data and uses
it to fill out a standard address object;
d) the generated address objects are compared, duplicate objects are
identified and processed (removed?).
At 06:49 AM 5/19/00 -0400, you wrote:
> Have any of the parsing experts spent any time with the parsing of
>mailing addresses e.g. recognizing that 234-500 Main Street
>Centertown, Maryland USA 12345 is the same as: 500 Main St. Apartment
>234 Centertown, Maryland 12345 US and 500 Main St Apt 234 Centertown,
>MD USA 12345 and Unit 234 500 Main Centertown MD USA 12345 by
>parsing to add standard address format that might be something like
>unit type, unit number, street number, street name, city, state/province,
>country, zip/postal code There are apparently lots of variants with
>different unit types acceptable and multilingual versions of most
>everything.
;- Elan >> [: - )]