[Bug 7656] UTF8 rules, normalize_charset etc overhaul

bugzilla-daemon Sun, 04 Aug 2019 02:03:53 -0700

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656


--- Comment #4 from Henrik Krohns <apa...@hege.li> ---
So getting back to this.

I've been running my SA with normalize_charset 1 without any ill-effects so
far. Should we head towards activating it by default in 4.0.0?

Only thing left after that would be documenting what format .cf files are
expected to be in. Probably just "bytes" without any special encoding? For
anything else than personal use, pure ascii should be used for portability
(non-ascii characters should be in \xff format).

To be compatible for both normalize_charset 0/1, it should be clearly
documented that any rules expected to hit latin1 extended characters would need
to be written to include both latin1/utf8 - "ä" -> (?:\xe4|\xc3\xa4). We could
also detect this automatically from rules and output warning that it should be
fixed.

One thing to consider would be removing the whole normalize_charset option, and
just force everything normalized, plain and simple.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7656] UTF8 rules, normalize_charset etc overhaul

Reply via email to