*snip*

So this could be:

( This is an email address )paul( an end &&^%%% to the "$%�$%� email
address
as a comment \sdfsdfsdf )@pjnetsolutions.com

Which you'll plainly agree is valid but completely silly!

Taking the local-part quoted-string idea you could have:

[CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]

 " " 

As a "valid" email or:

 " There is nothing to stop this being valid email address "
@pjnetsolutions.com

Which also follows the spec!

It all leads to a confusing and plainly quite silly regex when actually
we
know pretty much what we want.  Although one day I might actually try
and do
this!

*snip*

This is exactly the sort of thing that I started to run into when I had
my supposedly all encompassing uber-regex-stack. It's actually quite
tricky to strip all the stuff out of some of the email addresses people
use.

Like I said, I found that it caused more problems than it solved and
accepted that ultimately I would just have to accept a certain amount of
rubbish in my email fields.

I also found that having the password sent in a confirmation email
helped to reduce the amount of dirty data.

That is, of course, a solution that was specific to the application that
I was building, but it's worth considering nonetheless. It does really
depend on why you are trying to validate an email address in the first
place, but usually it's because you want to send an email to the user.
In that case, you probably want to make sure the user has a good reason
for putting the address in correctly ,such as getting the password for
their account, or having it activated. Once you've done that, you really
only want to check for typos which is something that can be done with a
relatively simple regex for the majority of email addresses.

As I said though, this all depends on why you want to validate the email
addresses in the first place, so you can take what you will from the
above paragraph.

Spike

Stephen Milligan
Team Macromedia - ColdFusion
Co-author 'Reality Macromedia ColdFusion MX: Intranets and Content
Management'
http://spikefu.blogspot.com

> -----Original Message-----
> From: Paul Johnston [mailto:[EMAIL PROTECTED]] 
> Sent: 28 January 2003 16:00
> To: [EMAIL PROTECTED]
> Subject: RE: [ cf-dev ] Regular Expression for Email and 
> Domain checking - it works!
> 
> 
> Spike,
> 
> At the risk of losing everyone else on the list... See inline:
> 
> > Hmmm...
> > 
> > According to the definition I'm looking at, local-part consists of:
> > 
> > dot-atom / quoted-string / obs-local-part
> 
> Agreed.  I took the dot-atom approach
> 
> > dot-atom allows:
> > 
> > [CFWS] dot-atom-text [CFWS]
> 
> So this basically is whitespace AROUND the text, which means 
> that if you trim the string, you're okay (that's basically 
> how I took it anyway).
> 
> > dot-atom-text allows:
> > 
> > 1*atext *("." 1*atext)
> 
> Which says: allow 1 or more atext, then 0 or more occurences 
> of "." and then 1 or more atext.
> 
> So basically it's saying you can't start or finish with a "." 
> and everything must be atext.
> 
> > atext allows:
> > 
> > [CFWS] 1*atext [CFWS]
> > 
> > and consists of:
> > 
> > ALPHA / DIGIT / ; Any character except controls.
> 
> Atext actually consists of:
> 
> atext           =       ALPHA / DIGIT / ; Any character 
> except controls,
>                         "!" / "#" /     ;  SP, and specials.
>                         "$" / "%" /     ;  Used for atoms
>                         "&" / "'" /
>                         "*" / "+" /
>                         "-" / "/" /
>                         "=" / "?" /
>                         "^" / "_" /
>                         "`" / "{" /
>                         "|" / "}" /
>                         "~"
> 
> The comments are there to explain what the code says (and for 
> everyone else, the "/" is basically an OR).  So atext is:
> 
> [a-zA-Z0-9!#$%'*+-/=?^_`{|]~] (unescaped)
> 
> So then the regex becomes:
> 
> [a-zA-Z0-9!#$%'*+-/=?^_`{|]~]+([.][a-zA-Z0-9!#$%'*+-/=?^_`{|]~])*
> 
> Which when escaped is exactly what is there.
> 
> > 
> > That appears to sugggest that dot-atom allows comments
> > containing folding whitespace (CFWS).
> > 
> 
> It does yes.
> 
> > quoted-string allows:
> > 
> > [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]
> > 
> > You can recurse back up through the various definitions of 
> > each of those if you want, but clealy folding whitespace 
> > (FWS) is permitted.
> 
> Agreed.
> 
> CFWS            =       *([FWS] comment) (([FWS] comment) / FWS)
> FWS             =       ([*WSP CRLF] 1*WSP) /   ; Folding white space
>                         obs-FWS
> WSP            =  SP / HTAB   ; white space
> SP             =  %x20 ; space char
> HTAB           =  %x09 ; horizontal tab
> 
> >From section 2.2.3:
> 
> "The general rule is that wherever this standard allows for 
> folding white
> space (not
> simply WSP characters), a CRLF may be inserted before any WSP"
> 
> So therefore FWS is basically whitespace that can have 
> linebreaks in it.
> 
> So CFWS is commented whitespace. What's a comment? Here:
> 
> quoted-pair     =       ("\" text) / obs-qp
> ctext           =       NO-WS-CTL /     ; Non white space controls
>                         %d33-39 /       ; The rest of the US-ASCII
>                         %d42-91 /       ;  characters not 
> including "("
> (40),
>                         %d93-126        ;  ")" (41), or "\" (92)
> ccontent        =       ctext / quoted-pair / comment
> comment         =       "(" *([FWS] ccontent) [FWS] ")"
> CFWS            =       *([FWS] comment) (([FWS] comment) / FWS)
> qcontent        =       qtext / quoted-pair
> qtext           =       NO-WS-CTL /     ; Non white space controls
> 
>                         %d33 /          ; The rest of the US-ASCII
>                         %d35-91 /       ;  characters not 
> including "\"
>                         %d93-126        ;  or the quote character
> 
> Remembering what the NO-WSP-CTL, text and obsolete bits are:
> 
> NO-WS-CTL       =       %d1-8 /         ; US-ASCII control characters
>                         %d11 /          ;  that do not include the
>                         %d12 /          ;  carriage return, line feed,
>                         %d14-31 /       ;  and white space characters
>                         %d127
> text            =       %d1-9 /         ; Characters 
> excluding CR and LF
>                         %d11 /
>                         %d12 /
>                         %d14-127 /
>                         obs-text
> obs-qp          =       "\" (%d0-127)
> 
> So basically a "comment" is:
> 
> ( anything I like in here whatsoever as long as it doesn't 
> have brackets or
> backslash and has either a whitespace or tab character after 
> the opening
> bracket and before the closing bracket although it can contain another
> comment ( like this ) or a quoted pair like this \this )
> 
> Although I don't see how you end a quoted-pair, unless it guesses.
> 
> So you could have an email like this:
> 
> [CFWS] dot-atom-text [CFWS]
> Or
> [CFWS] 1*atext *("." 1*atext) [CFWS]
> 
> So this could be:
> 
> ( This is an email address )paul( an end &&^%%% to the 
> "$%�$%� email address
> as a comment \sdfsdfsdf )@pjnetsolutions.com
> 
> Which you'll plainly agree is valid but completely silly!
> 
> Taking the local-part quoted-string idea you could have:
> 
> [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]
> 
>  " " 
> 
> As a "valid" email or:
> 
>  " There is nothing to stop this being valid email address "
> @pjnetsolutions.com
> 
> Which also follows the spec!
> 
> It all leads to a confusing and plainly quite silly regex 
> when actually we
> know pretty much what we want.  Although one day I might 
> actually try and do
> this!
> 
> Paul
> 
> 
> 
> 
> -- 
> ** Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/
> 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> For human help, e-mail: [EMAIL PROTECTED]
> 
> 


--
** Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
For human help, e-mail: [EMAIL PROTECTED]

Reply via email to