John Gardiner Myers <[EMAIL PROTECTED]> writes:

> the normalization support needs to be optional, defaulting to off.

What do you estimate the overhead would be?

> Better would be Mozilla's universal charset detector, which I would
> have to wrap up as a cpan module.

What is the license of it?  We try to avoid requiring additional
external CPAN modules.  We might want to ship with it... if the license
(and ASF policy) allows.
 
> The other issue is that Mail::SpamAssassin::HTML uses two calls to
> pack("C0A*", ...) in order to strip Perl's utf-8 flag from text going
> into and out of HTML::Parser.  When doing charset normalization, these
> two pack calls need to be removed.  In order for HTML::Parser to
> correctly handle utf8, one needs minimum versions of Perl 5.8 and
> HTML::Parser 3.39_90.  HTML::Parser 3.43 might be a better minimum
> version--I haven't reviewed the severity of the utf8 bug fixed in that
> release.  I see two possibilities:
> 
> 1) Condition the two pack calls on version checks: (perl < 5.8 ||
>    HTML::Parser < 3.43)
> 
> 2) Condition the two pack calls on charset normalization disabled.

We can probably safely up the requirement for HTML::Parser in our next
major revision.  Conditioning is also okay.

We should pay special attention to behaving as MUAs.  I believe some
MUAs will actually ignore the MIME character set and use the one
specified in the message HTML (if it is HTML).  We shouldn't necessarily
assume all MUAs have been configured to use the local character set at
all times.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Reply via email to