Re: Regular expression anyone?

Bill Barry Fri, 18 Dec 2009 07:51:00 -0800

There are many more valid chars for filenames within urls than this 
though, which makes the whole point moot I think.
At the very least I know you need (I have used all of the following 
fairly recently):
[a-zA-Z0-9_-%.?&=]
at which point you might as well simply define it as the chars that are 
not allowed:
[^'")]


As I understand though, these are allowed if you are properly following 
the spec:
|URI              url\({w}{string}{w}\) | 
url\({w}([!#$%&*-~]|{nonascii}|{escape})*{w}\)
nonascii         [^\0-\177]
escape           {unicode}|\\[^\n\r\f0-9a-f]
unicode          \\[0-9a-f]{1,6}(\r\n|[ \n\r\t\f])?
string           {string1}|{string2}
string1          \"([^\n\r\f\\"]|\\{nl}|{escape})*\"
string2          \'([^\n\r\f\\']|\\{nl}|{escape})*\'
nl               \n|\r\n|\r|\f
w                [ \t\r\n\f]*

||http://www.w3.org/TR/CSS21/syndata.html#tokenization

|which means that the true regex should be:
(?:url\([ 
\t\r\n\f]*(?:(?:"(?<Url>(?:[^\n\r\f\\"]|\\(?:\n|\r\n|\r|\f)|(?:\\[0-9a-f]{1,6}(?:\r\n|[
 
\n\r\t\f])?))*)")|(?:'(?<Url>(?:[^\n\r\f\\']|\\(?:\n|\r\n|\r|\f)|(?:\\[0-9a-f]{1,6}(?:\r\n|[
 
\n\r\t\f])?))*)'))[ \t\r\n\f]*\))|(?:url\([ 
\t\r\n\f]*(?<Url>(?:[!#$%&*-~]|[^\0-\177]|(?:(?:\\[0-9a-f]{1,6}(?:\r\n|[ 
\n\r\t\f])?)|\\[^\n\r\f0-9a-f]))*)[ \t\r\n\f]*\))

However I know that this is in fact wrong because simple unquoted urls 
fail to match simple plain text filenames (meaning either the spec has a 
bug here or I am incorrectly translating it to a regex, I suspect the 
spec is wrong because it really glosses over the whole parsing section 
much more than it should, it also conflicts with the url section of the 
spec: http://www.w3.org/TR/CSS21/syndata.html#uri).

Tim Barcz wrote:
> Sorry...need to put in context
>
> [a-z|A-Z|/|\.|-]
> (and I find I mistyped it) in the above character class [a-z|A-Z] => 
> [a-zA-Z]
>
> Sorry,
>
> Tim
>
> On Fri, Dec 18, 2009 at 5:56 AM, Ken Egozi <[email protected] 
> <mailto:[email protected]>> wrote:
>
>     [a-z][A-Z] => [a-zA-Z]
>     you sure?
>     the first will match two letters, first is locase, second is capital
>     while the second will match a single letter, either locase or capital.
>
>
>
>     On Fri, Dec 18, 2009 at 8:40 AM, Tim Barcz <[email protected]
>     <mailto:[email protected]>> wrote:
>
>         What about data scheme?
>
>
>         On Thu, Dec 17, 2009 at 11:52 PM, Bill Barry
>         <[email protected] <mailto:[email protected]>> wrote:
>
>             I would use one of the following (altering them some to
>             use in .net strings of course):
>
>             
> ^.*?url\(\s*(?<quote>["']?)(?<Url>(?!https?:|/)[^"')]+?)\k<quote>\s*\).*?$
>
>             options: ignorecase, multiline
>
>             or
>
>             url\(\s*(?<quote>["']?)(?<Url>(?!https?:|/)[^"')]+?)\k<quote>\s*\)
>
>             options: ignorecase
>             depending on whether you want the whole line or not; the
>             second is probably better because you technically can have
>             more than one url on a line:
>             UL { background-image: url(shadow-c.png);
>             list-style-image: url(bullet.png); } /* perfectly valid
>             css rule, under the first case you would capture only one
>             of the urls, but the second you could see both */
>
>             These regexes may still be missing some valid urls and
>             capturing some invalid ones because I am pretty sure there
>             is some sophisticated escaping rules in play for such urls
>             which I am outright ignoring..
>
>             testcases (including ones that fail previous posted regexes):
>             background-image: url(images/default/shadow-c.png); /*valid*/
>             background-image: url(shadow-c.png); /*valid*/
>             background-image: url(../images/default/shadow-c.png);
>             /*valid*/
>             background-image: url('../images/icons/file-xslx.gif')
>             !important; /*valid*/
>             background-image: url("../images/icons/file-xslx.gif")
>             !important; /*valid*/
>             background-image: url('http-header.gif') !important; /*valid*/
>             background-image: url('http_header.gif') !important; /*valid*/
>             background-image: url( 'http_header.gif') !important;
>             /*valid*/
>             background-image: url( 'http_header.gif' ) !important;
>             /*valid*/
>             background-image: url ( 'http_header.gif' ) !important;
>             /*space between url and ( not valid [at least according to
>             firefox 3.5]*/
>             background-image: url('../images/icons/file-xslx.gif")
>             !important; /*non-matching quotes*/
>             background-image: url(../images/icons/file-xslx.gif")
>             !important; /*missing start quote, might still be valid
>             depending on char escaping rules to look for the file
>             'file-xlsx.gif"'*/
>             background-image: url("../images/icons/file-xslx.gif)
>             !important; /*missing end quote*/
>             background-image: url(/images/icons/file-xslx.gif)
>             !important; /*absolute url*/
>             background-image: url('/images/icons/file-xslx.gif')
>             !important; /*absolute url*/
>             background-image: url("/images/icons/file-xslx.gif")
>             !important; /*absolute url*/
>             background-image:
>             url('http://example.com/images/icons/file-xslx.gif')
>             !important; /*absolute url*/
>             background-image:
>             url(http://example.com/images/icons/file-xslx.gif)
>             !important; /*absolute url*/
>
>
>             Additional testcases I didn't bother with:
>
>             background-image: url( \(.gif ) !important; /*valid,
>             filename = (.gif */
>             background-image: url( \).gif ) !important; /*valid,
>             filename = ).gif */
>             background-image: url( \'.gif ) !important; /*valid,
>             filename = '.gif */
>             background-image: url( \".gif ) !important; /*valid,
>             filename = ".gif */
>             background-image: url( \ .gif ) !important; /*valid,
>             filename = " .gif" */
>             background-image: url( \
>             .gif ) !important; /*valid (newline is part of the
>             filename) only when served with unix line endings
>             (filename would be invalid in windows line endings because
>             not both \r and \n are escaped here)*/
>             background-image: url(
>             a.gif ) !important; /*valid (newline is not part of the
>             filename)*/
>             background-image: url( '\(.gif' ) !important; /*valid,
>             filename = \(.gif */
>             background-image: url( '\).gif' ) !important; /*valid,
>             filename = \).gif */
>             background-image: url( '\'.gif' ) !important; /*valid,
>             filename = \'.gif */
>             background-image: url( '\".gif' ) !important; /*valid,
>             filename = \".gif */
>             background-image: url( '\ .gif' ) !important; /*valid,
>             filename = "\ .gif"*/
>             background-image: url( '\
>             .gif' ) !important; /*valid (newline and \ are part of the
>             filename, filename would be \\\n.gif if served with unix
>             line endings, \\\r\n.gif with windows line endings)*/
>
>
>             These are valid css rules, assuming that the filename is a
>             valid URI according to http://www.ietf.org/rfc/rfc3986
>             after css has taken care of the escape chars. Developing a
>             correct regex for rfc 3986 is a job suited only for a
>             regex engine like that of Perl 6 (it is a
>             non-deterministic context-sensitive grammar which makes it
>             unsuited for any regex language that has comparable
>             capabilities to Perl 5).
>
>
>             James Curran wrote:
>>             Oops.. Sorry, insufficent test cases...
>>
>>             These should match as well.
>>
>>             background-image: url(images/default/shadow-c.png);
>>             background-image: url(shadow-c.png);
>>
>>             Also, the url itself needs to be placed into a named capture 
>> called "Url".
>>
>>             Also, as a style note, in your patterns, you've written
>>             "[a-z|A-Z|/|\.|-]".  Inside the group brackets, the "or" is 
>> assumed.
>>             That should be [a-zA-Z/\.-].  As you wrote it, it would match a
>>             literal  verticle pipe character, which would be wrong.
>>
>>             On Thu, Dec 17, 2009 at 2:45 PM, Leonardo Lima
>>             <[email protected]> <mailto:[email protected]> 
>> wrote:
>>               
>>>             Hi,
>>>
>>>             Here I tested at Regex Buddy and worked only for the 3 first 
>>> entries, sorry
>>>             I think that I don´t understand your question...
>>>                 
>>               
>
>             --
>
>             You received this message because you are subscribed to
>             the Google Groups "Castle Project Development List" group.
>
>             To post to this group, send email to
>             [email protected]
>             <mailto:[email protected]>.
>             To unsubscribe from this group, send email to
>             [email protected]
>             <mailto:castle-project-devel%[email protected]>.
>             For more options, visit this group at
>             http://groups.google.com/group/castle-project-devel?hl=en.
>
>
>
>
>         -- 
>         Tim Barcz
>         Microsoft C# MVP
>         Microsoft ASPInsider
>         http://timbarcz.devlicio.us
>         http://www.twitter.com/timbarcz
>
>         --
>
>         You received this message because you are subscribed to the
>         Google Groups "Castle Project Development List" group.
>
>         To post to this group, send email to
>         [email protected]
>         <mailto:[email protected]>.
>         To unsubscribe from this group, send email to
>         [email protected]
>         <mailto:castle-project-devel%[email protected]>.
>         For more options, visit this group at
>         http://groups.google.com/group/castle-project-devel?hl=en.
>
>
>
>
>     -- 
>     Ken Egozi.
>     http://www.kenegozi.com/blog
>     http://www.delver.com
>     http://www.musicglue.com
>     http://www.castleproject.org
>     http://www.idcc.co.il - הכנס הקהילתי הראשון למפתחי דוטנט - בואו
>     בהמוניכם
>
>     --
>
>     You received this message because you are subscribed to the Google
>     Groups "Castle Project Development List" group.
>     To post to this group, send email to
>     [email protected]
>     <mailto:[email protected]>.
>     To unsubscribe from this group, send email to
>     [email protected]
>     <mailto:castle-project-devel%[email protected]>.
>     For more options, visit this group at
>     http://groups.google.com/group/castle-project-devel?hl=en.
>
>
>
>
> -- 
> Tim Barcz
> Microsoft C# MVP
> Microsoft ASPInsider
> http://timbarcz.devlicio.us
> http://www.twitter.com/timbarcz
>
> --
>
> You received this message because you are subscribed to the Google 
> Groups "Castle Project Development List" group.
> To post to this group, send email to 
> [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/castle-project-devel?hl=en.

--

You received this message because you are subscribed to the Google Groups 
"Castle Project Development List" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/castle-project-devel?hl=en.

Re: Regular expression anyone?

Reply via email to