Although it is not widely known (and web page authors break this routinely)
two dashes begins a comment and another two dashes ends it. So you can switch
back and forth several times in a single tag:
<!--comment--livestuff--comment--livestuff--comment...
As far as I know actual use of this "feature" is quite rare. So I ended up
coding my parser to actual use rather than to spec. I suspect the folks that
write web browsers do the same thing more often than they "should".
Parsing HTML is hard enough. Parsing garbage reliably -- and matching other
people's undocumented parsers and undocumented language elements -- is nearly
impossible.
From: JRB%w3f.com@Internet on 2000-03-07 04:32 PM
To: kero%3sheep.com@Internet
cc: libwww%perl.org@Internet (bcc: Marvin Simkin)
Subject: Re: patch for HTML::Parser 3.06 to fix declaration
commenthandling
Why not just look for "-->" to end it?
-JB
la mouton wrote:
>
> This fixes a bug in declaration handling. HTML::Parser supports comments
> within declarations (<! foo -- comment -->) incorrectly. Once we trigger
> a comment "--" we look for the next instance of "--" to denote the end of
> the comment. I put in a check for the end of tag character '>', otherwise
> we dont get out of comment mode before the appearance of another "--"
> marker.
>
> -f
>
> diff -u -r HTML-Parser-3.06/hparser.c HTML-Parser-3.06.fixed/hparser.c
> --- HTML-Parser-3.06/hparser.c Mon Mar 6 08:30:13 2000
> +++ HTML-Parser-3.06.fixed/hparser.c Tue Mar 7 11:24:54 2000
> @@ -792,8 +792,10 @@
> s++;
>
> while (1) {
> - while (s < end && *s != '-')
> + while (s < end && *s != '-' && *s != '>')
> s++;
> + if (*s == '>')
> + goto DONE;
> if (s == end)
> goto PREMATURE;
> s++;
> @@ -824,7 +826,8 @@
> if (s == end)
> goto PREMATURE;
> if (*s == '>') {
> - s++;
> + DONE:
> + s++;
> report_event(p_state, E_DECLARATION, beg, s, tokens, num_tokens,
> offset, self);
> FREE_TOKENS;