https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656

--- Comment #3 from Henrik Krohns <h...@hege.li> ---
(In reply to Henrik Krohns from comment #0)
> latin1 message, no ct     RULE_LATIN1 / <no hits>
> latin1 message, utf8 ct   RULE_LATIN1 / <no hits>
> latin1 message, no ct     RULE_UTF8 / <no hits>
> latin1 message, utf8 ct   RULE_UTF8 / <no hits>

Ok these should be now fixed..

Basically Encode::Detect::Detector thinks body "päivää" is Windows-1255
(Hebrew!!). 

dbg: message: failed decoding as declared charset UTF-8
dbg: message: decoded as detected charset windows-1255, declared UTF-8

Why are we using a module that hasn't been updated in 10 years anyway? Maybe
look at Encode::Guess which has been in core atleast from 5.8.8?

I simply added latin diacretic letters to SA's own basic Win-1252 detection. I
borrowed the \xc0-\xd6\xd8-\xde\xe0-\xf6\xf8-\xfe bit from textcat, also
looking at https://en.wikipedia.org/wiki/Windows-1252 it seems correct. Not
sure if the missing ÿ (\xff) should be added to here and textcat..

Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Message/Node.pm
Sending        trunk/lib/Mail/SpamAssassin/Message/Node.pm
Transmitting file data ..done
Committing transaction...
Committed revision 1846805.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to