This bug is part of the complex related to smoothing out all the edge
and corner cases of character set encoding for v4. There is some concern
that changing the default for normalize_charset (to enable it) or even
removing the switch altogether to nail down documentation of how to
match problem characters like the Latin-1 "extended ASCII" range:
basically any 8-bit character >127.
Making the change requires some work on rules that look for those
high-bit-set characters by people who understand encoding issues and
common failings (e.g. using a 1-byte high-bit-set character in a
notionally UTF-8 document.) My personal opinion is that the change is
worth the work, but I admit that I've not completely audited the default
rules for problematic cases. I have been writing rules to work with
normalize_charset for many years however. With reasonably modern Perl,
there's no strong argument for normalize_charset=0 beyond the technical
debt of code and rules written to accommodate it.
On 15 Apr 2021, at 8:55, bugzilla-dae...@spamassassin.apache.org wrote:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656
Bill Cole <billc...@apache.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |billc...@apache.org
--- Comment #15 from Bill Cole <billc...@apache.org> ---
(In reply to Henrik Krohns from comment #12)
Bumping this bug. Comments? Monologs are getting a bit tiresome.. :-)
+1
The minor pain of revamping rules that match non-ASCII characters is
compensated by the fact that this is a *normalization* and so reduces
the
frequency of edge cases that escape rules written (perhaps
inadvertently) to
depend on a particular subset of possible encodings. My personal
experience
running SA instances that see a lot of non-ASCII messages is that
enabling
normalize_charset is a best practice, and the default is basically
tech debt.
As for requiring discussion on-list, these comments are sent to the
dev list.
I'm going to bump it there to get the attention of anyone filtering
out
Bugzilla mail (!? if that's a thing...) and will also post on the
Users list to
get a broader audience.
--
You are receiving this mail because:
You are the assignee for the bug.
--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire