yes, this breaks on the front page of http://www.kozmo.com where intended
comments such as "<! row1 -->" get parsed incorrectly by HTML::Parser.
HTML::Parser interprets HTML text between <! row1 --> and <! row2 --> as
one huge declaration.  THis is because of HTML::Parser is looking only for
the matching "--" and not the end of tag character '>'.

-f

On 7 Mar 2000, Gisle Aas wrote:

> la mouton <[EMAIL PROTECTED]> writes:
> 
> > This fixes a bug in declaration handling.  HTML::Parser supports comments
> > within declarations (<! foo -- comment -->) incorrectly.  Once we trigger
> > a comment "--" we look for the next instance of "--" to denote the end of
> > the comment.  I put in a check for the end of tag character '>', otherwise
> > we dont get out of comment mode before the appearance of another "--"
> > marker.
> 
> I think this is wrong.  The ">" character inside comments should be
> perfectly legal.  Comment mode should last until "--".
> 
> Did you find actual HTML-documents that where mis-parsed?
> 
> Regards,
> Gisle
> 
> > 
> > -f
> > 
> > diff -u -r HTML-Parser-3.06/hparser.c HTML-Parser-3.06.fixed/hparser.c
> > --- HTML-Parser-3.06/hparser.c      Mon Mar  6 08:30:13 2000
> > +++ HTML-Parser-3.06.fixed/hparser.c        Tue Mar  7 11:24:54 2000
> > @@ -792,8 +792,10 @@
> >     s++;
> >  
> >     while (1) {
> > -     while (s < end && *s != '-')
> > +     while (s < end && *s != '-' && *s != '>')
> >         s++;
> > +          if (*s == '>')
> > +              goto DONE;
> >       if (s == end)
> >         goto PREMATURE;
> >       s++;
> > @@ -824,7 +826,8 @@
> >      if (s == end)
> >        goto PREMATURE;
> >      if (*s == '>') {
> > -      s++;
> > +      DONE:
> > +        s++;
> >        report_event(p_state, E_DECLARATION, beg, s, tokens, num_tokens,
> >                offset, self);
> >        FREE_TOKENS;
> > 
> 

Reply via email to