On Sat, Aug 31, 2002 at 08:20:33PM +0100, Matthew Toseland wrote:
> Looking at Parser.flex...
> 
> /* Non whitespace and not close of tag (right angle bracket).  I.e.
>  * chars that
>  * would not cause an unquoted attribute to end */
> NONSEP=[^>\n\r\ \t\b\012:?]
> NONSEP_NOQUOTE=[^>\n\r\ \t\b\012:?"]
> 
> This I don't understand... "?" or ":" do not terminate the attribute
> (meaning the URL in an a href=<unquoted URL>. Presumably it is to reduce
> backtracking? Anyway, the proposed modifications are:
> 
> NONSEP=[^>\n\r\ \t\b\012:]
> NONSEP_NOQUOTE=[^>\n\r\ \t\b\012:"]
> 
> ......
> 
> /* Catch any colon or ?htl= within the URL */
> LINK_PATTERNS1={LINK_ATTRS}{WS}={WS}["][^":]*[:][^"]*
> LINK_PATTERNS2={LINK_ATTRS}{WS}={WS}({NONSEP_NOQUOTE}{NONSEP}*)?[:]{NONSEP}*
> LINK_PATTERNS3={LINK_ATTRS}{WS}={WS}["][^"?]*?htl=
> LINK_PATTERNS4={LINK_ATTRS}{WS}={WS}({NONSEP_NOQUOTE}{NONSEP}*)?htl=
> LINK_PATTERNS={LINK_PATTERNS1}|{LINK_PATTERNS2}|{LINK_PATTERNS3}|{LINK_PATTERNS4}
JFlex's handling of "'s has changed... so the above is wrong. I have a
fixed version, with all the "'s escaped, even inside []'s, which is
apparently what jflex 1.5.3 wants.
> 
> This should achieve the functionality we want: block all colons (if we
> want to change the port, we should encode it as
> __CHECKED_HTTP_hostname_port__ or something), allow ? unless it's part
> of a ?htl=... However, I could be grossly mistaken. Comments?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20020831/a4efc070/attachment.pgp>

Reply via email to