This bug is part of the complex related to smoothing out all the edge and corner cases of character set encoding for v4. There is some concern that changing the default for normalize_charset (to enable it) or even removing the switch altogether to nail down documentation of how to match problem characters like the Latin-1 "extended ASCII" range: basically any 8-bit character >127.

Making the change requires some work on rules that look for those high-bit-set characters by people who understand encoding issues and common failings (e.g. using a 1-byte high-bit-set character in a notionally UTF-8 document.) My personal opinion is that the change is worth the work, but I admit that I've not completely audited the default rules for problematic cases. I have been writing rules to work with normalize_charset for many years however. With reasonably modern Perl, there's no strong argument for normalize_charset=0 beyond the technical debt of code and rules written to accommodate it.


On 15 Apr 2021, at 8:55, bugzilla-dae...@spamassassin.apache.org wrote:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656

Bill Cole <billc...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |billc...@apache.org

--- Comment #15 from Bill Cole <billc...@apache.org> ---
(In reply to Henrik Krohns from comment #12)
Bumping this bug. Comments? Monologs are getting a bit tiresome.. :-)

+1

The minor pain of revamping rules that match non-ASCII characters is
compensated by the fact that this is a *normalization* and so reduces the frequency of edge cases that escape rules written (perhaps inadvertently) to depend on a particular subset of possible encodings. My personal experience running SA instances that see a lot of non-ASCII messages is that enabling normalize_charset is a best practice, and the default is basically tech debt.

As for requiring discussion on-list, these comments are sent to the dev list. I'm going to bump it there to get the attention of anyone filtering out Bugzilla mail (!? if that's a thing...) and will also post on the Users list to
get a broader audience.

--
You are receiving this mail because:
You are the assignee for the bug.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Reply via email to