http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078





------- Additional Comments From [EMAIL PROTECTED]  2007-02-17 04:03 -------
I just noticed that my comment 12 doesn't really address comment 11 which really
does need to be answered given that this bug was opened for the example attached
in comment 4 in which only the From and Subject headers contain hibit characters
from the Windows-1255 charset, with the body specifying Windows-1255 but only
containing Roman alphabet characters.

It seems to me that we should not reject mail because the From header is in some
foreign charset. You would get that from someone whose native language is, for
example, Hebrew who is sending email in English.

To catch a Hebrew Subject, we would have to add a test to the check of locale in
HeaderEval.pm for WINDOWS-1255 and a majority of the charactes in the Subject
header being hibit. I think the test for locale in HeaderEval.pm should not test
the From header as it currently does.

There is also a test for locale in HTMLEval.pm. There also we aren't catching
WINDOWS-* charsets such as WINDOWS-1255 used for Hebrew. But to check for hibit
characters we would have to test against the text portion of the HTML.

I think that the changes I suggested in comment 8 are pretty safe, but they only
help to catch non-Roman languages in plain text bodies. The changes for headers
and HTML I think will have to tested against corpora to see how they perform.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to