https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6226

--- Comment #4 from John Hardin <[email protected]> 2009-10-25 09:10:01 UTC ---
It's not a matter of the encoding of the message itself so much as it is the
presence of 8-bit characters in the match strings (e.g. somebody directly typed
an iso-8859-1 accented character into an unencoded 8-bit message body, and that
gets hit by some rule).

UTF8 encoding is done to temporarily armor the match string results for IPC to
avoid hanging the entire masscheck process when the IO library routines
encounter a wide character. Perhaps the _proper_ solution involves encoding the
match strings much earlier in the process, when the scan generates them and the
original encoding of the message (if any) is known. I didn't dig that deeply
into it.


I should have posted this earlier - here's the tail of the log output from
unpatched masscheck using -j>1 when an unencoded wide character appears in the
match text. After this point all of the masscheck processes are present, but
idle until killed.

-------------------BEGIN LOG
.....status:  26% ham: 1361   spam: 953    date: 2005-12-02   now: 2009-10-24
01:56:28 PM
....
.
status:  27% ham: 1413   spam: 990    date: 2007-06-29   now: 2009-10-24
01:57:21 PM
Wide character in print at /usr/lib64/perl5/5.8.8/x86_64-linux/IO/Handle.pm
line 401.
-------------------END LOG

IO/Handle.pm:
398 sub print {
399     @_ or croak 'usage: $io->print(ARGS)';
400     my $this = shift;
401     print $this @_;
402 }

When running without -j there are "wide character in print" warnings but the
masscheck process runs to completion.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to