https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7133
--- Comment #5 from John Wilcock <[email protected]> --- Firstly, thanks for all the good work so far on this. Thinking about this purely from the rule-writing user's point of view, totally ignoring the history and largely ignoring the underlying technical details :-) I would want to be able to include any Unicode characters directly in the rule file, and have it match the equivalent characters in the message, regardless of Content-Type charset, and regardless of any base64, quoted-printable and/or (for HTML message parts) &entity; encoding. So I should be able to write things like body CRAZY_EURO /€uro/ header SUBJ_CREDIT_FR Subject =~ /crédit/ and match any occurrences of "€uro" or "crédit" regardless of what charset the message was originally encoded in and whether entities were used. This of course would imply that rule .cf files would need to be encoded in UTF-8 (or whatever) and subjected to charset normalisation. I guess that's a whole new can of worms, but IMO it would make it far easier to address international spam patterns. After all your efforts to normalise the message, it would be a great shame to have to encode all non-ASCII characters in rules, e.g. body CRAZY_EURO /\x{20AC}uro/ though I would of course expect things to work if written that way. It would be an even greater shame if rules had to be written as UTF-8 bytes body CRAZY_EURO /\xE2\x82\xACuro/ Next question: what effect (if any) would this have on rawbody rules? -- You are receiving this mail because: You are the assignee for the bug.
