https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6226
--- Comment #4 from John Hardin <[email protected]> 2009-10-25 09:10:01 UTC --- It's not a matter of the encoding of the message itself so much as it is the presence of 8-bit characters in the match strings (e.g. somebody directly typed an iso-8859-1 accented character into an unencoded 8-bit message body, and that gets hit by some rule). UTF8 encoding is done to temporarily armor the match string results for IPC to avoid hanging the entire masscheck process when the IO library routines encounter a wide character. Perhaps the _proper_ solution involves encoding the match strings much earlier in the process, when the scan generates them and the original encoding of the message (if any) is known. I didn't dig that deeply into it. I should have posted this earlier - here's the tail of the log output from unpatched masscheck using -j>1 when an unencoded wide character appears in the match text. After this point all of the masscheck processes are present, but idle until killed. -------------------BEGIN LOG .....status: 26% ham: 1361 spam: 953 date: 2005-12-02 now: 2009-10-24 01:56:28 PM .... . status: 27% ham: 1413 spam: 990 date: 2007-06-29 now: 2009-10-24 01:57:21 PM Wide character in print at /usr/lib64/perl5/5.8.8/x86_64-linux/IO/Handle.pm line 401. -------------------END LOG IO/Handle.pm: 398 sub print { 399 @_ or croak 'usage: $io->print(ARGS)'; 400 my $this = shift; 401 print $this @_; 402 } When running without -j there are "wide character in print" warnings but the masscheck process runs to completion. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
