[Bug 7144] [REVIEW] To normalize_charset or not to normalize_charset, that is the question.

bugzilla-daemon Mon, 06 Apr 2015 17:28:18 -0700

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7144


--- Comment #4 from Mark Martinec <[email protected]> ---
> In looking at this, if +1's we will need to make Encode::Detect a
> requirement rather than optional

Not necessarily. The Encode::Detect is now only used rarely if other
attempts fail - unlike previously in 3.4.0, where the module was
essential for operation. I wouldn't even care much for this module,
but I kept it as it's been there in use before. It is still flagged
as optional in the DependencyInfo.pm, and its importance is played down
in the DependencyInfo's report.


> Also need to update the UPGRADE and README to reflect this change
> if we get another +1.

I wonder how effective these current drugs misspellings rules are,
which assume Latin1 encoding. I haven't noticed degradation when
I began playing with normalize_charset and turned it on (rendering
them ineffective), but that's just anecdotal.

Currently I don't see an easy way to let rules know what encoding
they are dealing with, so can't make them conditional (or tflagged).
One possibility is to use 'rawbody' instead of 'body' for such rules
that expect original encoding of a message.  Rawbody avoids charset
normalization, but also avoids decoding HTML (which may or may not
affect them).

I don't have a strong opinion on the default value of normalize_charset.
For our site I certainly want it on (regardless of possibly rendering
some stock rules ineffective), as it makes it easier to write rules for
non-English text. Perhaps a gentle nudge in the release notes to suggest
people to turn normalize_charset on when upgrading to 3.4.1, but leaving
a default unchanged for this minor version update? The drag is that
there will be some users base staying on pre-3.4.1 version for quite
some time still, yet keeping their rules up-to-date.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7144] [REVIEW] To normalize_charset or not to normalize_charset, that is the question.

Reply via email to