On Sat, Aug 31, 2002 at 08:20:33PM +0100, Matthew Toseland wrote: > Looking at Parser.flex... > > /* Non whitespace and not close of tag (right angle bracket). I.e. > * chars that > * would not cause an unquoted attribute to end */ > NONSEP=[^>\n\r\ \t\b\012:?] > NONSEP_NOQUOTE=[^>\n\r\ \t\b\012:?"] > > This I don't understand... "?" or ":" do not terminate the attribute > (meaning the URL in an a href=<unquoted URL>. Presumably it is to reduce > backtracking? Anyway, the proposed modifications are: > > NONSEP=[^>\n\r\ \t\b\012:] > NONSEP_NOQUOTE=[^>\n\r\ \t\b\012:"] > > ...... > > /* Catch any colon or ?htl= within the URL */ > LINK_PATTERNS1={LINK_ATTRS}{WS}={WS}["][^":]*[:][^"]* > LINK_PATTERNS2={LINK_ATTRS}{WS}={WS}({NONSEP_NOQUOTE}{NONSEP}*)?[:]{NONSEP}* > LINK_PATTERNS3={LINK_ATTRS}{WS}={WS}["][^"?]*?htl= > LINK_PATTERNS4={LINK_ATTRS}{WS}={WS}({NONSEP_NOQUOTE}{NONSEP}*)?htl= > LINK_PATTERNS={LINK_PATTERNS1}|{LINK_PATTERNS2}|{LINK_PATTERNS3}|{LINK_PATTERNS4} JFlex's handling of "'s has changed... so the above is wrong. I have a fixed version, with all the "'s escaped, even inside []'s, which is apparently what jflex 1.5.3 wants. > > This should achieve the functionality we want: block all colons (if we > want to change the port, we should encode it as > __CHECKED_HTTP_hostname_port__ or something), allow ? unless it's part > of a ?htl=... However, I could be grossly mistaken. Comments?
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20020831/a4efc070/attachment.pgp>