Although it is not widely known (and web page authors break this routinely) 
two dashes begins a comment and another two dashes ends it. So you can switch 
back and forth several times in a single tag:

<!--comment--livestuff--comment--livestuff--comment...

As far as I know actual use of this "feature" is quite rare. So I ended up 
coding my parser to actual use rather than to spec. I suspect the folks that 
write web browsers do the same thing more often than they "should".

Parsing HTML is hard enough. Parsing garbage reliably -- and matching other 
people's undocumented parsers and undocumented language elements -- is nearly 
impossible.





From:   JRB%w3f.com@Internet on 2000-03-07 04:32 PM
To:     kero%3sheep.com@Internet
cc:     libwww%perl.org@Internet (bcc: Marvin Simkin)
Subject:        Re: patch for HTML::Parser 3.06 to fix declaration 
commenthandling

Why not just look for  "-->" to end it?
-JB

la mouton wrote:
> 
> This fixes a bug in declaration handling.  HTML::Parser supports comments
> within declarations (<! foo -- comment -->) incorrectly.  Once we trigger
> a comment "--" we look for the next instance of "--" to denote the end of
> the comment.  I put in a check for the end of tag character '>', otherwise
> we dont get out of comment mode before the appearance of another "--"
> marker.
> 
> -f
> 
> diff -u -r HTML-Parser-3.06/hparser.c HTML-Parser-3.06.fixed/hparser.c
> --- HTML-Parser-3.06/hparser.c  Mon Mar  6 08:30:13 2000
> +++ HTML-Parser-3.06.fixed/hparser.c    Tue Mar  7 11:24:54 2000
> @@ -792,8 +792,10 @@
>         s++;
> 
>         while (1) {
> -         while (s < end && *s != '-')
> +         while (s < end && *s != '-' && *s != '>')
>             s++;
> +          if (*s == '>')
> +              goto DONE;
>           if (s == end)
>             goto PREMATURE;
>           s++;
> @@ -824,7 +826,8 @@
>      if (s == end)
>        goto PREMATURE;
>      if (*s == '>') {
> -      s++;
> +      DONE:
> +        s++;
>        report_event(p_state, E_DECLARATION, beg, s, tokens, num_tokens,
>                    offset, self);
>        FREE_TOKENS;



Reply via email to