I would not consider this a reason to call HTML::Parser broken.  The current
behavior is in accordance with the standard.

It might make sense to add this to the list of things that are relaxed by
$parser->strict_comment(0), but the real right answer is to politely point
out  the invalid comments to the offending site's webmaster.
--
Mac :})
----- Original Message -----
From: "la mouton" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Marvin Simkin" <[EMAIL PROTECTED]>; "kero%3sheep.com"
<[EMAIL PROTECTED]>; "libwww%perl.org" <[EMAIL PROTECTED]>
Sent: Wednesday, March 08, 2000 19:21
Subject: Re: patch for HTML::Parser 3.06 to fix declaration commenthandling


> this is what I experienced also.  Comments like "<! row1 -->" get treated
> like comments by browsers and HTML::Parser should behave the same way.
>
> -f
>
> On Tue, 7 Mar 2000, John Beaman wrote:
>
> > What you say may be true, as I have heard this from
> > more than one source.  Seems that Web Browsers use
> > the "<!" and the next ">" to delimit comments, and
> > do not follow the convention you mention.  This can
> > be tested with things like:
> >   "<! this ---- is -- a -- test -->"
> > or
> >   "<!-- this is a test>"
> > where you will see that example 1 and 2 "browses" OK.
> > The "dashes" are ignored!  Browsers look for ONLY the
> > beginning "<!" and the ending ">" and that's all.
> > -JB
> >
> > Marvin Simkin wrote:
> > >
> > > Although it is not widely known (and web page authors break this
routinely)
> > > two dashes begins a comment and another two dashes ends it. So you can
switch
> > > back and forth several times in a single tag:
> > >
> > > <!--comment--livestuff--comment--livestuff--comment...
> > >
> > > As far as I know actual use of this "feature" is quite rare. So I
ended up
> > > coding my parser to actual use rather than to spec. I suspect the
folks that
> > > write web browsers do the same thing more often than they "should".
> > >
> > > Parsing HTML is hard enough. Parsing garbage reliably -- and matching
other
> > > people's undocumented parsers and undocumented language elements -- is
nearly
> > > impossible.
> > >
> > > From:   JRB%w3f.com@Internet on 2000-03-07 04:32 PM
> > > To:     kero%3sheep.com@Internet
> > > cc:     libwww%perl.org@Internet (bcc: Marvin Simkin)
> > > Subject:        Re: patch for HTML::Parser 3.06 to fix declaration
> > > commenthandling
> > >
> > > Why not just look for  "-->" to end it?
> > > -JB
> > >
> > > la mouton wrote:
> > > >
> > > > This fixes a bug in declaration handling.  HTML::Parser supports
comments
> > > > within declarations (<! foo -- comment -->) incorrectly.  Once we
trigger
> > > > a comment "--" we look for the next instance of "--" to denote the
end of
> > > > the comment.  I put in a check for the end of tag character '>',
otherwise
> > > > we dont get out of comment mode before the appearance of another
"--"
> > > > marker.
> > > >
> > > > -f
> > > >
> > > > diff -u -r HTML-Parser-3.06/hparser.c
HTML-Parser-3.06.fixed/hparser.c
> > > > --- HTML-Parser-3.06/hparser.c  Mon Mar  6 08:30:13 2000
> > > > +++ HTML-Parser-3.06.fixed/hparser.c    Tue Mar  7 11:24:54 2000
> > > > @@ -792,8 +792,10 @@
> > > >         s++;
> > > >
> > > >         while (1) {
> > > > -         while (s < end && *s != '-')
> > > > +         while (s < end && *s != '-' && *s != '>')
> > > >             s++;
> > > > +          if (*s == '>')
> > > > +              goto DONE;
> > > >           if (s == end)
> > > >             goto PREMATURE;
> > > >           s++;
> > > > @@ -824,7 +826,8 @@
> > > >      if (s == end)
> > > >        goto PREMATURE;
> > > >      if (*s == '>') {
> > > > -      s++;
> > > > +      DONE:
> > > > +        s++;
> > > >        report_event(p_state, E_DECLARATION, beg, s, tokens,
num_tokens,
> > > >                    offset, self);
> > > >        FREE_TOKENS;
> >
>

Reply via email to