Re: bug in HTML::TokeParser 3.02

Gisle Aas Fri, 14 Jan 2000 06:06:13 -0800
la mouton <[EMAIL PROTECTED]> writes:

> There is a bug in hparser.c [line 519] irt the handling of HTML comments.
> 
> 518      if (s < end) {
> 519        s = token_pos.end + 2;
> 520        goto LOCATE_END;
> 
> On line 519, we are looking for the end of an html comment '-->' and
> incrementing by 2, when it should be incrementing by 1.  This bug comes to
> life when there is an odd number of '-' characters in an html comment.
> 
> line 519 should be:
> 
> s = token_pos.end + 1 ;
> 
> I patched it on my distribution and it is correct.

Oops.  I will upload 3.03 with this fix today.


> Gisle,
> why did $p->decode_text_entities go away in release 2.99_15?

Because I did not like that interface.  With HTML::Parser you now
select "dtext" in argspec if you want decoding and "text" if you don't
want it.  For the attributes in start tags you select "attr" if you
want decoding and "tokens" if you don't want it.

> I don't agree that html entities should be autmatically decoded;  this
> should be up to the programmer and application of HTML::TokeParser.  I
> personally had to comment out the code for your HTML decoding for my
> application :)  It would be great if you would put it back in; thoughts?

For HTML::TokeParser we have the problem that you only get decoded
attribute values ("attr" preselected).  A patch to HTML::TokeParser
that makes this selectable might be acceptable, but I don't want to
add a general $p->decode_text_entities option again.

Regards,
Gisle
Re: bug in HTML::TokeParser 3.02

Reply via email to