On Tue, Nov 29, 2011 at 4:54 PM, Sheppy R <bobross...@gmail.com> wrote:

> Couldn't you just use the non-whitespace character to capture everything
> before and after the @ symbol?
>
> s/^.*\s(\S+@\S+)\s.*$/$1/
>
>
> Yes you could of course but... this is why I was saying nearly no syntax
checking... the minor check to ensure that you have the . in there helps to
weed out the mystuff@someplace funny none email addresses.

The biggest problem with email addresses is that the rules of how an email
address can be formatted are so relaxed and thus so complex that there is
to the best of my knowledge not a single person that has ever managed to
create a 100% correct regular expression that checks if a string does in
fact match all criteria of a valid email address. One of the problems is
that even if you where to manage to create such a thing there are a few
possibilities in the specification that are valid but will not likely be
accepted by any email server or mail client.

Take for instance the following email address 1....1....1....1@something...@
somewhere.info this is technically a valid email address but I can already
tell you that your mail client is likely to choke on it and the mail server
at somewhere.info will not like it much either. This is the problem with
the email addresses as they are used as opposed to the specification and
what that allows.

I would personally try and avoid doing any sanity checking on the emails
you filter out at least on the first pass... After all the reason you are
filtering emails addresses is because in the end it is better to have a few
none existing email addresses then miss a few valid once. The cost of
missing a email address is a order of magnitude bigger then the cost of
sending out an email that bounces because of a none existing email address.
Therefore I suggest you grab as much as you can on the first pass anything
that smells like an email address. The next step you can then filter
further to for instance only those that have a domain name that your DNS
server can lookup. Then the next step is to try and mail them and filter
the once out that bounce, so check your email account for any messages
stating that the receiving mail server does not know the account.
In the end you will have a list of valid existing email addresses that you
can then spam to no end or well what ever else your intention is with them
;-)

Regards,

Rob Coops

Reply via email to