Spike,

At the risk of losing everyone else on the list... See inline:

> Hmmm...
> 
> According to the definition I'm looking at, local-part consists of:
> 
> dot-atom / quoted-string / obs-local-part

Agreed.  I took the dot-atom approach

> dot-atom allows:
> 
> [CFWS] dot-atom-text [CFWS]

So this basically is whitespace AROUND the text, which means that if you
trim the string, you're okay (that's basically how I took it anyway).

> dot-atom-text allows:
> 
> 1*atext *("." 1*atext)

Which says: allow 1 or more atext, then 0 or more occurences of "." and then
1 or more atext.

So basically it's saying you can't start or finish with a "." and everything
must be atext.

> atext allows:
> 
> [CFWS] 1*atext [CFWS] 
> 
> and consists of:
> 
> ALPHA / DIGIT / ; Any character except controls.

Atext actually consists of:

atext           =       ALPHA / DIGIT / ; Any character except controls,
                        "!" / "#" /     ;  SP, and specials.
                        "$" / "%" /     ;  Used for atoms
                        "&" / "'" /
                        "*" / "+" /
                        "-" / "/" /
                        "=" / "?" /
                        "^" / "_" /
                        "`" / "{" /
                        "|" / "}" /
                        "~"

The comments are there to explain what the code says (and for everyone else,
the "/" is basically an OR).  So atext is:

[a-zA-Z0-9!#$%'*+-/=?^_`{|]~] (unescaped)

So then the regex becomes:

[a-zA-Z0-9!#$%'*+-/=?^_`{|]~]+([.][a-zA-Z0-9!#$%'*+-/=?^_`{|]~])*

Which when escaped is exactly what is there.

> 
> That appears to sugggest that dot-atom allows comments 
> containing folding whitespace (CFWS).
> 

It does yes.

> quoted-string allows:
> 
> [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]
> 
> You can recurse back up through the various definitions of 
> each of those if you want, but clealy folding whitespace 
> (FWS) is permitted.

Agreed.

CFWS            =       *([FWS] comment) (([FWS] comment) / FWS)
FWS             =       ([*WSP CRLF] 1*WSP) /   ; Folding white space
                        obs-FWS
WSP            =  SP / HTAB   ; white space
SP             =  %x20 ; space char
HTAB           =  %x09 ; horizontal tab

>From section 2.2.3:

"The general rule is that wherever this standard allows for folding white
space (not
simply WSP characters), a CRLF may be inserted before any WSP"

So therefore FWS is basically whitespace that can have linebreaks in it.

So CFWS is commented whitespace. What's a comment? Here:

quoted-pair     =       ("\" text) / obs-qp
ctext           =       NO-WS-CTL /     ; Non white space controls
                        %d33-39 /       ; The rest of the US-ASCII
                        %d42-91 /       ;  characters not including "("
(40),
                        %d93-126        ;  ")" (41), or "\" (92)
ccontent        =       ctext / quoted-pair / comment
comment         =       "(" *([FWS] ccontent) [FWS] ")"
CFWS            =       *([FWS] comment) (([FWS] comment) / FWS)
qcontent        =       qtext / quoted-pair
qtext           =       NO-WS-CTL /     ; Non white space controls

                        %d33 /          ; The rest of the US-ASCII
                        %d35-91 /       ;  characters not including "\"
                        %d93-126        ;  or the quote character

Remembering what the NO-WSP-CTL, text and obsolete bits are:

NO-WS-CTL       =       %d1-8 /         ; US-ASCII control characters
                        %d11 /          ;  that do not include the
                        %d12 /          ;  carriage return, line feed,
                        %d14-31 /       ;  and white space characters
                        %d127
text            =       %d1-9 /         ; Characters excluding CR and LF
                        %d11 /
                        %d12 /
                        %d14-127 /
                        obs-text
obs-qp          =       "\" (%d0-127)

So basically a "comment" is:

( anything I like in here whatsoever as long as it doesn't have brackets or
backslash and has either a whitespace or tab character after the opening
bracket and before the closing bracket although it can contain another
comment ( like this ) or a quoted pair like this \this )

Although I don't see how you end a quoted-pair, unless it guesses.

So you could have an email like this:

[CFWS] dot-atom-text [CFWS]
Or
[CFWS] 1*atext *("." 1*atext) [CFWS]

So this could be:

( This is an email address )paul( an end &&^%%% to the "$%�$%� email address
as a comment \sdfsdfsdf )@pjnetsolutions.com

Which you'll plainly agree is valid but completely silly!

Taking the local-part quoted-string idea you could have:

[CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]

 " " 

As a "valid" email or:

 " There is nothing to stop this being valid email address "
@pjnetsolutions.com

Which also follows the spec!

It all leads to a confusing and plainly quite silly regex when actually we
know pretty much what we want.  Although one day I might actually try and do
this!

Paul




--
** Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
For human help, e-mail: [EMAIL PROTECTED]

Reply via email to