John Stracke <[EMAIL PROTECTED]> writes:
> > HTML::Parser decode entities with the 'dtext' argspec and leave them
> > alone for 'text'.
>
> I'm not specifying dtext, and is getting decoded.
>
> Uh...but I might be using an old form of the interface, with different
> defaults. My subclass's constructor just calls HTML::Parser->new().
For v2 undecoded text should still be the default. But entitites _will_
be decoded in attribute values.
If you want UTF8 output then it should just be a matter of
transforming the data to UTF8 afterwards. The Unicode::String module
should be usable here.
For perl 5.6 there will be some problems if the input to the parser is
UTF8, because you then end up with a mix of UTF8 encoded chars and
latin1 changes where entity decoding has taken place.
Regards,
Gisle