https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7645
Henrik Krohns <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #6 from Henrik Krohns <[email protected]> --- There's some utf8 rules, for example (I've used "cat -v" to print them..) body HS_BODY_899 /The seller hasnM-CM-"M-bM-^BM-,M-bM-^DM-"t provided any postage details yet/ body HS_BODY_1575 /diesem Grund folgende Zahlung zu stornieren. Um den dafM-CM-<r nM-CM-6tigen/ Basically the wide print error comes from outputting "scanner1.re", which ends up containing char *Mail_SpamAssassin_CompiledRegexps_body_0_scan1(unsigned char **p){ unsigned char *q = 1 + *p; /*!re2c "diesem grund folgende zahlung zu stornieren" {RET("HS_BODY_1575,[l=1]");} "the seller hasnâ" {RET("HS_BODY_899,[l=1]");} [\000-\377] { return NULL; } */ Not sure if we should just print with binmode utf8 or similar, so the utf8 characters end up in scanner1.re, or perhaps convert them first to some hex \xAB value. I guess this depends on what re2c is expecting. I'm not sure what state utf8 rules/checks are in anyway. If there isn't already, we should have some docs/bug describing all the steps from reading .cf with utf8 rules to how the rule is stored and matched to decoded body (which is, or is not utf8?).. and also how sa-compile fits in all of this.. -- You are receiving this mail because: You are the assignee for the bug.
